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Preface 



The scribes didn't have a large enough set from which to determine 
patterns. 

Brandon Sanderson 

The Hero of Ages 



This partial solution manual to our book Introducing Monte Carlo Methods 
with R, published by Springer Verlag in the User R! series, on December 2009, 
has been compiled both from our own solutions and from homeworks written 
by the following Paris-Dauphine students in the 2009-2010 Master in Statis- 
tical Information Processing (TSI): Thomas Bredillet, Anne Sabourin, and 
Jiazi Tang. Whenever appropriate, the R code of those students has been 
identified by a # (C . ) Name in the text. We are grateful to those students for 
allowing us to use their solutions. A few solutions in Chapter 4 are also taken 
verbatim from the solution manual to Monte Carlo Statistical Methods com- 
piled by Roberto Casarin from the University of Brescia (and only available 
to instructors from Springer Verlag). 

We also incorporated in this manual indications about some typos found 
in the first printing that came to our attention while composing this solu- 
tion manual have been indicated as well. Following the new "print on de- 
mand" strategy of Springes' Verlag, these typos will not be found in the 
versions of the book purchased in the coming months and should thus be 
ignored. (Christian Robert's book webpage at Universite Paris-Dauphine 
www . ceremade . dauphine . f r/~xicin/books . html is a better reference for the 
"complete" hst of typos.) 

Reproducing the warning Jean-Michel Marin and Christian P. Robert 
wrote at the start of the solution manual to Bayesian Core, let us stress 
here that some self-study readers of Introducing Monte Carlo Methods with R 
may come to the realisation that the solutions provided here are too sketchy 
for them because the way we wrote those solutions assumes some minimal 
familiarity with the maths, the probability theory and with the statistics be- 
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hind the arguments. There is unfortunately a limit to the time and to the 

efforts wc can put in this solution manual and studying Introducing Monte 
Carlo Methods with R requires some prerequisites in maths (such as matrix 
algebra and Riemann integrals), in probability theory (such as the use of joint 
and conditional densities) and some bases of statistics (such as the notions of 
inference, sufficiency and confidence sets) that we cannot cover here. Casella 
and Berger (2001) is a good reference in case a reader is lost with the "basic" 
concepts or sketchy math derivations. 

We obviously welcome solutions, comments and questions on possibly er- 
roneous or ambiguous solutions, as well as suggestions for more elegant or 
more complete solutions: since this manual is distributed both freely and in- 
dependently from the book, it can be updated and corrected [almost] in real 
time! Note however that the R codes given in the following pages are not opti- 
mised because wc prefer to use simple and understandable codes, rather than 
condensed and efficient codes, both for time constraints and for pedagogical 
purposes: some codes were written by our students. Therefore, if you find 
better [meaning, more efficient/faster] codes than those provided along those 
pages, we would be glad to hear from you, but that does not mean that we will 
automatically substitute your R code for the current one, because readability 
is also an important factor. 

A final request: this manual comes in two versions, one corresponding to 
the odd-numbered exercises and freely available to everyone, and another one 
corresponding to a larger collection of exercises and with restricted access 
to instructors only. Duplication and dissemination of the more extensive "in- 
structors only" version are obviously prohibited since, if the solutions to most 
exercises become freely available, the appeal of using our book as a textbook 
will be severely reduced. Therefore, if you happen to possess an extended ver- 
sion of the manual, please refrain from distributing it and from reproducing 
it. 

Sceaux and Gainesville Christian P. Robert and George Casella 
January 17, 2010 
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Basic R programming 



Exercise 1.1 
Self-explanatory. 

Exercise 1.3 
Self-explanatory. 

Exercise 1.5 

One problem is the way in which R handles parentheses. So 

> n=10 

> l:n 

produces 

123456789 10 

but 

> n=10 

> l:n-l 

produces 

0123456789 

since the 1 : 10 command is executed first, then 1 is subtracted. 

The command seq(l ,n-l ,by=l) operates just as 1 : (n-1) . If n is less than 
1 we can use something like seq( 1 , . 05 , by=- . 01) . Try it, and try some other 
variations. 
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Exercise 1.7 

a. To bootstrap the data you can use the code 

Boot=2500 

B=array(0,dim=c(nBoot, 1)) 
for (i in l:nBoot){ 

ystar=scanple (y , replace=T) 

B [i] =ineaii(ystar) 

} 

The quantile can be estimated with sort (B) [ . 95*nBoot] , which in our 
case/sample is 5.8478. 

b. To get a confidence interval requires a double bootstrap. That is, for each 
bootstrap sample we can get a point estimate of the 95% quantile. We can 
then run an histogram on these quantiles with hist, and get their upper 
and lower quantiles for a confidence region. 

nBootl=1000 
nBoot2=1000 

Bl=array(0,dim=c(nBootl, 1)) 
B2=array(0,dim=c(nBoot2, 1)) 
for (i in l:nBootl){ 

ystar=sample (y , replace=T) 

for (j in l:nBoot2) 

B2 [ j ] =mean (sample (ystar , replace=T) ) 

Bl [i] =sort (B2) [ . 95*nBoot2] 

> 

A 90% confidence interval is given by 

> c(sort(Bl) [.05*nBootl] , sort(Bl) [.95*nBootl] ) 
[1] 4.731 6.844 

or alternatively 

> quantile (Bl,c(. 05, .95)) 

5% 95% 
4.731 6.844 

for the data in the book. The command hist(Bl) will give a histogram 
of the values. 

Exercise 1.9 

If you type 
> mecin 

function (x, . . . ) 

UseMethod ( "mean" ) 
<environment : namespace : base> 
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you do not get any information about the function mean because it is not 
written in R, while 

> sd 

function (x, na.rm = FALSE) 
{ 

if (is .matrix (x) ) 

apply (x, 2, sd, na.rm = na.rm) 
else if (is.vector(x)) 

sqrt(var(x, na.rm = na.rm)) 
else if (is.data.freime(x)) 

sapplyCx, sd, na.rm = na.rm) 
else sqrt (var (as. vector (x) , na.rm = na.rm)) 

} 

shows sd is written in R. The same apphes to var and cov. 
Exercise 1.11 

When looking at the description of attach, you can see that this command 
allows to use variables or functions that arc in a database rather than in the 
current .RData. Those objects can be temporarily modified without altering 
their original format. (This is a fragile command that we do not personaly 
recommend!) 

The function assign is also rather fragile, but it allows for the creation 
and assignment of an arbitrary number of objects, as in the documentation 
example: 

for(i in 1:6) { #— Create objects 'r.l', 'r.2', ... 'r.6' — 
nam <- paste ("r",i, sep=".") 
assign(nam, l:i) 

> 

which allows to manipulate the r.l, r.2, variables. 
Exercise 1.13 

This is mostly self-explanatory. If you type the help on each of those functions, 
you will see examples on how they work. The most recommended R function 
for saving R objects is save. Note that, when using write, the description 

states 

The data (usually a matrix) 'x' are written to file 
'file'. If 'x' is a two-dimensional matrix you need 
to transpose it to get the columns in 'file' the same 
as those in the internal representation. 

Note also that dump and sink are fairly involved and should use with caution. 
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Exercise 1.15 

Take, for example a=3;x=c(l ,2,3,4,5) to see that they are the same, and, 
in fact, are the same as niax(which(x == a)). For y=c(3,4,5,6,7,8), try 
match(x,y) and match(y,x) to see the difference. In contrast, x7,in%y and 
y7.in7oy return true/false tests. 

Exercise 1.17 

Running system. time on the three sets of commands give 

a. 0.004 0.000 0.071 

b. 

c. 0.000 0.000 0.001 

and the vectorial allocation is therefore the fastest. 

Exercise 1.19 

The R code is 

> A=matrix(runif (4) ,ncol=2) 

> A=A/apply(A,l,sum) 

> apply ( A7.*"/oA , 1 , sum) 
[1] 1 1 

> B=A;for (t in 1:100) B=B%*7.B 

> apply (B,l, sum) 
[1] Inf Inf 

and it shows that numerical inaccuracies in the product leads to the property 
to fail when the power is high enough. 

Exercise 1.21 

The function xyplot is part of the lattice library. Then 

> xyplot (age ~ circumference, data=OrcLiige) 

> barchartCage ~ circumference, data=Orange) 

> bwplotCage ~ circumference, data=Oraiige) 

> dotplotCage ~ circumference, data=Orange) 

produce different representations of the datasct. Fitting a linear model is 
simply done by lm(age ~ circumference , data=Orange) and using the tree 
index as an extra covariate leads to 
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> summary dm (age ~ circumf erence+Tree, data=Oraiige) ) 
Coefficients : 

Estimate Std. Error t value Pr(>|t|) 
(Intercept) -90.0596 55.5795 -1.620 0.116 
circumference 8.7366 0.4354 20.066 < 2e-16 *** 
Tree.L -348.8982 54.9975 -6.344 6.23e-07 *** 

Tree.Q -22.0154 52.1881 -0.422 0.676 

Tree.C 72.2267 52.3006 1.381 0.178 

Tree~4 41.0233 52.2167 0.786 0.438 

meaning that only Tree .L was significant. 



Exercise 1.23 

a. A plain representation is 
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where empty slots are represented by zeros. 

b. A simple cleaning of non-empty (i.e. certain) slots is 

for (i in 1:9) 
for (j in 1:9){ 

if (s[i,j]>0) pool[i,j,-s[i,j]]=FALSE 

} 

c. In R, matrices (and arrays) are also considered as vectors. Hence s [i] 
represents the (1 + [{i — 1)/9J , (i — 1) mod 9 + 1) entry of the grid. 

d. This is self-explanatory. For instance, 

> a=2;b=5 

> boxa 
[1] 12 3 

> boxb 
[1] 4 5 6 
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e. The first loop checks whether or not, for each remaining possible integer, 
there exists an identical entry in the same row, in the same column or 
in the same box. The second command sets entries for which only one 
possible integer remains to this integer. 

f. A plain R program solving the grid is 

while (sum(s==0)>0){ 

for (i in sample(l:81)){ 
if (s[i]==0){ 

a=((i-l)7,7,9)+l 
b=trunc((i-l)/9)+l 
boxa=3*trunc ( (a-1) /3) +1 
boxa=boxa: (boxa+2) 
boxb=3*trunc ( (b- 1 ) /3) +1 
boxb=boxb: (boxb+2) 

for (u in (1 : 9) [pool [a,b,] ] ){ 

pool[a,b,u] = (sum(u==s[a,] )+sum(u==s [,b] ) 
+sum (u==s [boxa , boxb] ) ) ==0 

} 

if (sumCpool [a, b,] )==!){ 
s[i]=(l:9) [pool[a,b,]] 
} 

if (sumCpool [a, b,] )==0){ 
print ( "wrong sudoku" ) 
break 
} 

} 

> 

} 

and it stops with the outcome 
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which is the solved Sudoku. 
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Random Variable Generation 



Exercise 2.1 

For a random variable X with cdf F, if 

F-{u) = inf{a;, F(a;) < u], 

then, for U ~ U[Q, 1], for all y e M, 

P(f-(C/) < y) = P(inf{a;,F(a;) <U}<y) 

= F{F{y) >U) as is non-decreasing 
= F{y) as U is uniform 

Exercise 2.3 

a. It is easy to see that E[C/i] = 0, and a standard calculation shows that 
var(J7i) = 1/12, from which the result follows. 

b. Histograms show that the tails of the 12 uniforms are not long enough. 
Consider the code 

nsini=10000 
ul=runif (nsim) 
u2=runif (nsim) 

Xl=sqrt (-2*log(ul) ) *cos (2*pi*u2) 
X2=sqrt (-2*log(ul) ) *sin(2*pi*u2) 
U=array (0 ,dim=c(nsim, 1) ) 

for(i in l:nsim)U[i]=suin(runif (12,-.5, .5)) 

par (mf row=c (1,2)) 

hist (XI) 

hist (U) 

a=3 

mean(Xl>a) 
inecin(U>a) 
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mean (rnorm (nsim) >a) 
l-pnonii(a) 

c. You should sec the difference in the tails of the histogram. Also, the nu- 
merical output from the above is 

[1] 0.0016 
[1] 5e-04 
[1] 0.0013 
[1] 0.001349898 

where we see that the Box-MuUer and rnorm are very good when compared 
with the exact pnorm. Try this calculation for a range of nsim and a. 

Exercise 2.5 

For U ~ ^0,1]) Y ^ din); s-nd X ~ f{x), such that f /g < M, the acceptance 
condition in the Accept-Reject algorithm is that U < f{Y)/{Mg{Y)). The 
probability of acceptance is thus 
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Assume f /g is only known up to a normalising constant, i.e. //g = k.f /g, 
with f/g< M, M being a well-defined upper bound different from M because 
of the missing normalising constants. Since F ~ 5, 




1 



kM 



Therefore the missing constant is given by 




which can be estimated from the empirical acceptance rate. 



2 Random Variable Generation 11 

Exercise 2.7 

The ratio is equal to 

r{a + i3) r{a)r(b) 



r{a)r{i3) r{a + b) 



a;"-" (1 - xf-'' 



and it will not diverge at x = only if a < a and at a; = 1 only if & < /3. The 
maximum is attained for 

a — a (3 — b 
X* 1 — x* ' 

i.e. is 



Ma,b 



r{a + f3) r{a)r{b) (a-a)"""(/3-6)'3-'' 
r{a)r{j3) r{a + b) {a-a + P- 6)"-«+/3-6 ' 



The analytic study of this quantity as a function of (o, b) is quite delicate but 
if we define 

mab=f unction(a,b) { 

Igainma(a) +lgajiima(b) + (alph-a) *log (alph-a) + (beta-b) *log(beta-b) 
-(alph+bet-a-b)*log(alph+bet-a-b)> 

it is easy to see using contour on a sequence of a's and 6's that the maximum 
of Mafi is achieved over integer values when a = [a\ and b= [^\. 



Exercise 2.9 



Given 9, exiting the loop is driven hy X = xq, which indeed has a probability 
f{xo\0) to occur. If X is a discrete random variable, this is truly a probability, 
while, if X is a continuous random variable, this is zero. The distribution of 
the exiting 6 is then dependent on the event X = xq taking place, i.e. is 
proportional to ■K{9)f{xo\9), which is exactly Tr{9\xo). 



Exercise 2.11 

a. Try the R code 

nsinK-5000 
n=25;p=.2; 

cp=pbinom(c(0:n) ,n,p) 
X=array (0 , c (nsim, 1) ) 
for(i in l:nsim){ 

u=runif (1) 

X[i]=siim(cp<u) 

} 

hist(X,freq=F) 

linesd :n,dbinoin(l :n,n,p) ,lwd=2) 
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which produces a histogram and a mass function for the binomial B{25, .2). 
To check timing, create the function 

MYbinom<-f unction (sO ,nO ,pO) { 
cp=pbinom(c (0 :nO) ,nO,pO) 
X=array(0,c(sO,l)) 
for (i in l:sO){ 

u=runif (1) 

X[i]=siiin(cp<u) 

} 

return (X) 
} 

and use system. time (rbinom (5000, 25, .2)) and system. time (MYbinom (5000, 25, 

to sec how much faster R is. 
b. Create the R functions Wait and Trans: 

Wait <-f unction (sO , alpha) { 
U=array(0,c(sO,l)) 
for (i in l:sO){ 
u=runif (1) 

while (u > alpha) u=runif(l) 

U[i]=u 

} 

return (U) 
} 

Trans<-f unction(sO , alpha) { 
U=array(0,c(sO,l)) 

for (i in l:sO) U[i]=alpha*runif (1) 

return (U) 

} 

Use hist (Wait (1000, .5)) and hist (Trans (1000 ,. 5) ) to see the corre- 
sponding histograms. Vary n and a. Use the system. time command as 
in part a to see the timing. In particular, Wait is very bad if a is small. 

Exercise 2.13 

The cdf of the Pareto V{a) distribution is 

F{x) = l-x-" 

over (l,oo). Therefore, F~^{U) = (1 — U)~^^", which is also the — 1/a power 
of a uniform variate. 
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Exercise 2.15 

Define the R functions 

PoisK-f unction(sO , IconO) {. 
spread=3*sqrt (lamO) 

t=round(seq(max(0,lainO-spread) ,lamO+spread, 1) ) 
prob=ppois (t , lamO) 
X=rep(0,sO) 
for (i in l:sO)-[ 
u=runif (1) 

X[i]=max(t [1] ,0)+siim(prob<u)-l 

} 

return (X) 
} 

and 

Pois2<-f unction(sO , lamO) { 
X=rep(0,sO) 
for (i in l:sO){ 
sum=0;k=l 

sum=sum+rexp(l , lamO) 

while (suin<l)-[ sum=sum+rexp(l,lamO) ;k=k+l}- 

X[i]=k 

} 

return (X) 
} 

and then run the commands 

> nsini=100 

> lambda=3.4 

> system. time(Poisl(nsim, lambda)) 

user system elapsed 
0.004 0.000 0.005 

> system. time(Pois2(nsim, lambda)) 

user system elapsed 
0.004 0.000 0.004 

> system. time(rpois(nsim, lambda)) 

user system elapsed 


for other values of nsim and lambda. You will see that rpois is by far the 
best, with the exponential generator (Pois2) not being very good for large 
A's. Note also that Poisl is not appropriate for small A's since it could then 
return negative values. 
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Exercise 2.17 

a. Since, ii X ^ Ga{a, /3), then f3X — P-^j ^ Go.{<^i l)i is the inverse 
of a scale parameter. 

b. The Accept-Reject ratio is given by 



f{x) x^-^e-"" 



g{x) A e 

The maximum of this ratio is obtained for 

n— 1 n—l 
^^-(l-A)-O, I.e. for x = • 

Therefore, 

and this upper bound is minimised in A when A = 1/ri. 
c. If g is the density of the Ga{a, b) distribution and / the density of the 
Ga{a, 1) distribution, 

^a—l^—bx^a 
di^) = FT^ ^^'^ /(^) 



r{a) ' r{a) 

the Accept-Reject ratio is given by 



Therefore, 



g{x) r{a)h°-x''-^e-^'' 



= 5°e-^(i-^)x"-"-i {(a - a) - (1 - h)x} 

ox g 



provides x* — a — ajl — has the argument of the maximum of the ratio, 
since -(0) = 0. The upper bound M is thus given by 



It obviously requires 6 < 1 and a < a. 
d. Warning: there is a typo in the text of the first printing, it should 
be: 

^ Show that the maximum of h~°'{l — h)°-~°' is attained at 6 = a/a, and 
hence the optimal choice of b for simulating Ga{oi, 1) is 6 = a/a, which 
^ gives the same mean for both Gci{a, 1) and Gci{a,b). 
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With this modification, the maximum of M(a, b) in b is obtained by deriva- 
tion, i.e. for b sohition of 

a a — a 
-b-—b-'^ 

which leads to 6 = a/a as the optimal choice of b. Both Qa{a^\) and 
Qa{a,a/a) have the same mean a. 
e. Since 



M(a, a/a) = [a/a)"' — = {a/ay-a^-- = a"/a^ , 

\ (1 — a/a)e J 

M is decreasing in a and the largest possible value is indeed a = [aj . 
Exercise 2.19 



The ratio f /g is 

/(x) _ exp{-xV2}/^/2^ _ 
g{x) aexpj— Q!|a::|}/2 a 



exp{a|a;| — /2} 



and it is maximal when x = ±a, so M = y277rexp{a2/2}/a. Taking the 
derivative in a leads to the equation 



a- \ 



that is, indeed, to a = 1. 



Exercise 2.21 



Warning: There is a typo in this exercise, it should be: 



(i) . a mixture representation (2.2), where g{x\y) is the density of X^+2v ^'^'^ 

p{y) is the density of 7'(A/2), and 

(ii) . the sum of a Xp-i random variable and the square of a Af{V A, 1). 

a. Show that both those representations hold. 

b. Compare the corresponding algorithms that can be derived from these 
representations among themselves and also with rchisq for small and 



large values of A. 




If we use the definition of the noncentral chi squared distribution, XpW ^.s 
corresponding to the distribution of the squared norm | |a;| ^ of a normal vector 
X ~ Mp{9, Ip) when A — W^W^ , this distribution is invariant by rotation over the 
normal vector and it is therefore the same as when x ^ A/^((0, . . . , 0, \/A), Ip), 
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hence leading to the representation (ii), i.e. as a sum of a Xp-i random variable 
and of the square of a A/'(| |0| | , 1) variable. Representation (i) holds by a regular 
mathematical argument based on the series expansion of the modified Bessel 
function since the density of a non-central chi-squared distribution is 



Since rchisq includes an optional non-ccntrality parameter nc, it can be 
used to simulate directly a noncentral chi-squared distribution. The two sce- 
narios (i) and (ii) lead to the following R codes. 

> sy St em . t ime ( {x=r chi sq(10"6,df=5, ncp=3 ) > ) 

user system elapsed 

> system. time ({x=rchisq(10"6,df =4)+rnorm(10~6 ,meaji=sqrt (3) ) ~2}) 

user system elapsed 
1.700 0.056 1.757 

> system. t ime ({x=rchisq(10~6,df=5+2*rpois(10~6, 3/2))}) 

user system elapsed 
1.168 0.048 1.221 

Repeated experiments with other values of p and A lead to the same conclu- 
sion that the Poisson mixture representation is the fastest. 



Exercise 2.23 

Since the ratio n{0\x)/Tr{9) is the likelihood, it is obvious that the optimal 
bound M is the likelihood function evaluated at the MLE (assuming tt is a 
true density and not an improper prior). 

Simulating from the posterior can then be done via 

theta0=3 ; n=100 ; N=10~4 
x=rnorm(n)+thetaO 

lik=f unction (the) {prod (dnorm (x , mean=the) ) } 
M=optimise(f=f unction (the)-[prod (dnorm(x,mean=the) )}, 

int=range(x) ,max=T)$obj 
theta=rcauchy (N) 

res=(M*runif (N)>apply(as . matrix (theta) , 1 ,lik) ) ; print (sum (res) /N) 

while (sum(res) >0) {le=sum(res) ; theta [res] =rcauchy(le) 

res [res] =(M*runif (le) >apply (as .matrix (theta [res] ) , 1 , lik) )} 

The rejection rate is given by 0.9785, which means that the Cauchy proposal 
is quite inefficient. An empirical confidence (or credible) interval at the level 

95% on 9 is (2.73, 3.799). Repeating the experiment with n = 100 leads (after 
a while) to the interval (2.994,3.321), there is therefore an improvement. 



/(a;|A) = 2(^/A)(^^'^/^/(p-2)/2(v^)e-(^+^)/2 



where 
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Monte Carlo Integration 



Exercise 3.1 

a. The plot of the integrands follows from a simple R program: 

f l=function(t){ t/(l+t*t)*exp(-(x-t) ~2/2)> 

f 2=f uiictioii(t) { 1/ (l+t*t) *exp (- (x-t) ~2/2) } 

plot (f 1 , -3 , 3 , col=l , ylim=c (-0 .5,1), xlab="t " , ylab=" " , ty="l " ) 

plot (f 2 , -3 , 3 , add=TRUE , col=2 , ty= " 1 " ) 

legendC'topright" , c("f l=t . f 2" , "f 2") , lty=l,col=l :2) 

Both numerator and denominator are expectations under the Cauchy dis- 
tribution. They can therefore be approximated directly by 

Niter=10"4 

co=r cauchy (Miter) 

I=mean(co*dnorm(co ,me£La=x) ) /me£ai(dnorm(co ,mean=x) ) 
We thus get 

> x=0 

> mean ( co*dnorin (co, inean=x) ) /mean (dnorm (go, meaii=x) ) 
[1] 0.01724 

> x=2 

> mean(co*dnorm(co ,mean=x) ) /mean(dnorm(co ,meaii=x) ) 
[1] 1.295652 

> x=4 

> mean(co*dnorm(co ,mean=x) ) /mean(dnorm(co ,mean=x) ) 
[1] 3.107256 

b. Plotting the convergence of those integrands can be done via 

# (C.) Anne Sabourin, 2009 
xl=dnorm(co,mean=x) 

estint2=cumsum(xl) / (1 : Niter) 

esterr2=sqrt (cumsum( (xl-estint2) "2) ) / (1 : Niter) 
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xl=co*xl 

estiiitl=cumsum(xl) )/(l :Niter) 

esterr2=sqrt (cumsum( (xl-estintl) ~2) ) / (1 : Niter) 
par (mf row=c (1,2)) 

plot (est intl,type="l" ,xlab=" iteration" ,ylab=" " , col="gold") 
lines (estint l-2*esterr 1 , lty=2 , lwd=2) 
lines (estint l+2*esterrl , lty=2 , lwd=2) 

plot (est int2,type="l" ,xlab=" iteration" ,ylab=" " , col="gold") 
lines (estint2-2*esterr2 , lty=2 , lwd=2) 
lines (estint2+2*esterr2 , lty=2 , lwd=2) 

Because we have not yet discussed the evaluation of the error for a 
ratio of estimators, wc consider both terms of the ratio separately. 
The empirical variances a are given by var (co*dnorin(co,m=x) ) and 
var(diiorin(co,in=x)) and solving 2a/^/n < 10~^ leads to an evaluation 
of the number of simulations necessary to get 3 digits of accuracy. 

> x=0 ;max(4*var (dnorm(co ,m=x) ) *10~6, 
+ 4*var(co*dnorin(co,m=x))*10~6) 

[1] 97182.02 

> x=2; 4*10~6*inax(var(dnonii(co,m=x)) ,var(co*dnorin(co,in=x))) 
[1] 220778.1 

> x=4; 10"6*4*inax(var(dnonii(co,m=x)) ,var(co*dnorin(co,in=x))) 
[1] 306877.9 

c. A similar implementation applies for the normal simulation, replacing 
dnorm with dcauchy in the above. The comparison is clear in that the 
required number of normal simulations when a; = 4 is 1398.22, to compare 
with the above 306878. 

Exercise 3.3 

Due to the identity 

/■°°exp(-^) /-i/^o exp(-^) 

P(X>20 = / ' dx= ^ ^*^' 20du, 

J 20 \/2^ Jo 20w2^ 

we can see this integral as an expectation under the U{0, 1/20) distribution 
and thus use a Monte Carlo approximation to F{X > 20). The following R 
code monitors the convergence of the corresponding approximation. 

# (C.) Thomas Bredillet, 2009 

h=function(x){ l/(x"2*sqrt(2*pi)*exp(l/(2*x"2)))} 
par (mf row=c (2,1)) 

curve (h,from=0,to=l/20,xlab="x" ,ylab="h(x) " ,lwd="2") 
I=l/20*h(runif (10"4)/20) 

estint=cumsum(I)/(l : 10"4) 

esterr=sqrt (cumsum( (I-estint) "2))/(l : 10"4) 
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plot ( est int , xlab= " Iterat ions " , ty= " 1 " , lwd=2 , 
ylim=mean(I)+20*c(-esterr [10~4] , esterr [10~4] ) ,ylab="") 
lines (est int+2*esterr , col= "gold" , lwd=2) 
lines (estint-2*esterr , col="gold" , lwd=2) 

The estimated probability is 2.505e— 89 with an error of ±3.61e— 90, compared 
with 

> int egrate(h,0, 1/20) 

2.759158e-89 with absolute error < 5.4e-89 

> pnorm(-20) 

[1] 2.7536246-89 



Exercise 3.5 

Warning: due to the (late) inclusion of an extra-exercise in the book, 
the "above exercise" actually means Exercise 3.3!!! 

When Z ~ A/'(0, 1), with density /, the quantity of interest is 9{Z > 4.5), 
i.e. ]E-''[Iz>4.5]. When g is the density of the exponential £xp{X) distribution 
truncated at 4.5, 

_ ly>4.5Aexp(-Ay) _ ^^(^_4.5) 

J_4 5Aexp(-Aj/)dj/ 

simulating iid F^'^ 's from g is straightforward. Given that the indicator func- 
tion Iy>4.5 is then always equal to 1, F{Z > 4.5) is estimated by 

_ 1 " /(yW) 

"~ n^5(yW)■ 
A corresponding estimator of its variance is 

1=1 

The following R code monitors the convergence of the estimator (with A = 
1,10) 

# (C.) Anne Sabourin, 2009 
Nsim=5*10"4 
x=rexp(Nsim) 
par (mf col=c (1,3)) 
for (la in c(.5,5,50)){ 
y=(x/la)+4.5 

weit=dnorm(y) / dexp (y-4 . 5 , la) 
est=cumsum(weit) / (1 : Nsim) 
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varest=cumsmn( (1-est) "2*weit/ (1 : Nsim) ~2) 

plot (est, type="l" ,ylini=c(3e-6,4e-6) ,main="P(X>4 . 5) estimate" , 
sub=paste( "based on E(",la,") simulations" ,sep="") ,xlab="" ,ylab="") 
abline (a=pnorm(-4. 5) ,b=0 , col="red") 
> 

When evaluating the impact of A on the variance (and hence on the conver- 
gence) of the estimator, similar graphs can be plotted for different values of A. 
This experiment does not exhibit a clear pattern, even though large values of 
A, like A = 20 appear to slow down convergence very much. Figure 3.1 shows 
the output of such a comparison. Picking A = 5 seems however to produce a 
very stable approximation of the tail probability. 



P(X>4.5) estimate 



P(X>4.5) estimate 



P(X>4.5) estimate 
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Fig. 3.1. Comparison of three importance sampling approximations to the normal 
tail probability ¥{Z > 4.5) based on a truncated £xp{X) distribution with A = 
.5, 5.50. The straight red line is the true value. 



Exercise 3.7 

While the expectation of — x) is well defined for v > 1/2, the integral 

of x/(l — a:) against the t density does not exist for any ly. Using an importance 



3 Monte Carlo Integration 21 



sampling representation, 

J 1-x g{x) 

if g{l) is finite. The integral will be finite around 1 when 1/(1 — t)g{t) is in- 
tcgrablc, which means that g(t) can go to infinity at any rate. For instance, 
if g{t) « (1 — t)~" around 1, any a > is acceptable. 

Exercise 3.9 

As in Exercise 3.1, the quantity of interest is S'^{x) = E''{6\x) = J 6TT{d\x) dO 
where x ~ J^{0, 1) and ^ C(0, 1). The target distribution is 

7r(^|a;)oc7r(e)e-(^-^)'/2 = /,(e). 

A possible importance function is the prior distribution, 

and for every 9 G < when M = tt. Therefore, generating from 

the prior g and accepting simulations according to the Accept-Reject ratio 
provides a sample from 7r(0|x). The empirical mean of this sample is then 
a converging estimator oi¥j^{0\x). Furthermore, we directly deduce the esti- 
mation error for 5. A graphical evaluation of the convergence is given by the 
following R program: 

f=function(t){ exp(-(t-3)"2/2)/(l+t"2)} 
M=pi 

Msiiii=2500 

postdist=rep (0 , Nsim) 
for (i in l:Nsim)-[ 

u=riin.if (1)*M 

postdist [i] =rcauchy (1) 

while (u>f (postdist [i] ) / dcauchy (postdist [i] ) ) { 
u=runif (1)*M 
postdist [i]=rcauchy(l) 
}} 

estdelta=cuinsuin (postdist) / (1 : Nsim) 

esterrd=sqrt (cumsum( (postdist-estdelta) "2) ) / (1 : Nsim) 
par (mf row=c (1,2)) 

Cl=matrix(c(estdelta,estdelta+2*esterrd,estdelta-2*esterrd) ,ncol=3) 
matplot(Cl,ylim=c(l .5,3) ,type="l" ,xlab=" Iterations" ,ylab=" ") 
plot (esterrd , type=" 1 " , xlab=" Iterations " , ylab=" " ) 
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Exercise 3.11 

a. If X ~ £xp{l) then for x > a, 



px—a px 

'[a + X <x]= / exp(-t) dt= exp(-t + a)dt = P(F < a;) 

Ja 



when y ~ £'xp+(a, 1), 
b. If X ~ xi, then 



p+oo 9-3/2 

P(X > 25) = y ,xi/2 exp(-a;/2) da; 



Jl2.5 



y^{x)exp{-12.5) 



3 exp(— x + 12.5) dx . 

-'"'(2) 



'12.5 -f U 

The corresponding R code 

# (C.) Thomas Bredilllet, 2009 

h=f unction (x) { exp (-x) *sqrt (x) /gamma (3/2) > 

X = rexp(10~4,l) + 12.5 

I=exp (-12 . 5) *sqrt (X) /gainma(3/2) 

estint=cumsum(I)/(l : 10"4) 

esterr=sqrt (cumsum( (I-estint) ~2) ) / (1 : 10~4) 
plot (estint ,xlab="Iterations" , ty="l" , lwd=2 , 
ylim=mean(I)+20*c(-esterr [10~4] , esterr [10"4] ) ,ylab="") 
lines (estint+2*esterr , col="gold" , lwd=2) 
lines (estint-2*esterr , col="gold" , lwd=2) 

gives an evaluation of the probability as 1.543e — 05 with a 10~^ error, to 
compare with 

> integrate (h, 12 . 5 , Inf) 

1.544033e-05 with absolute error < 3.4e-06 

> pchisq(25,3,low=F) 
[1] 1.544050e-05 

Similarly, when X t^, then 

W(X > 50) = f —= ^^^^ exp(-t + 50) At 

Ao ^5*7r)r(2,5)(l + f)3expH + 50) 

and a corresponding R code 

# (C.) Thomas Bredilllet, 2009 

h=f unction(x) { 1/sqrt (5*pi) *gamma(3) /gamma(2 . 5) *1/ ( l+x~2/5) ~3} 
integrate (h , 50 , Inf ) 
X = rexp(10"4,l) + 50 

I=l/sqrt(5*pi)*gamma(3)/gamma(2.5)*l/(l+X"2/5)"3*l/exp(-X+50) 



3 Monte Carlo Integration 23 



estint=ciimsmn(I) / (1 : 10~4) 

esterr=sqrt (cumsumC (I-estint) ~2) ) /(I : 10~4) 

plot (estint ,xlab="Mean and error range" ,type="l" ,lwd=2, 

ylim=mean(I)+20*c(-esterr [10~4] , esterr [10'4] ) ,ylab=" ") 

lines (estint+2*esterr , col="gold" , lwd=2) 

lines (est int-2*est err , col="gold" , lwd=2) 

As seen on the graph, this method induces jumps in the convergence 
patterns. Those jumps are indicative of variance problems, as should be 
since the estimator does not have a finite variance in this case. The value 
returned by this approach differs from alternatives evaluations: 

> meaind) 

[1] 1.529655e-08 

> sd(l)/10-2 

[1] 9.328338e-10 

> integrate(h,50,lnf ) 

3.023564e-08 with absolute error < 2e-08 

> pt(50,5,low=F) 
[1] 3.023879e-08 

and cannot be trusted. 

Warning: There is a missing line in the text of this question, 
which should read: 

Explore the gain in efficiency from this method. Take a = 4.5 in part (a) 
and run an experiment to determine how many normal Af(0, 1) random 
variables would be needed to calculate P{Z > 4.5) to the same accuracy 
obtained from using 100 random variables in this importance sampler. ^ 

If we use the representation 

/>00 /"OO 

P(Z > 4.5) = / ip{z)dz= / (^(x + 4.5) exp(a;) exp(-a;) dx , 

J 4.5 Jo 

the approximation based on 100 realisations from an £xp{l) distribution, 
xiiTi . . . , xiOO, is 

^ 100 

— ^ (^(a;, + 4.5) exp(a;,) 



100 

i=l 



and the R code 



> x=rexp(100) 

> mean(dnorm(x+4.5)*exp(x)) 
[1] 2.817864e-06 

> var(dnorm(x+4.5)*exp(x))/100 
[1] 1.544983e-13 
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shows that the variance of the resulting estimator is about 10~^^. A simple 

simulation of a normal sample of size m and the resulting accounting of 
the portion of the sample above 4.5 leads to a binomial estimator with a 
variance of V{Z > 4.5)P(Z < 4.5) /m, which results in a lower bound 



i.e. close to ten million simulations. 
Exercise 3.13 

For the three choices, the importance weights arc easily computed: 

xl=sainple (c (-1 , 1) , 10~4,rep=T) *rexp(10"4) 
wl=exp(-sqrt (abs (xl) ) ) *sin(xl) "2* (xl>0) / . 5*dexp(xl) 

x2=rcauchy(10"4)*2 

w2=exp (-sqrt (abs (x2) ) ) *sin(x2) ~2* (x2>0) /dcauchy (x2/2) 
x3=rnorin(10~4) 

w3=exp (-sqrt (abs (x3) ) ) *sin(x3) "2* (x3>0) /dnorm(x3) 

They can be evaluated in many ways, from 

boxplot (as . data . f rsmie (cbind (wl , w2 , w3) ) ) 

to computing the effective sample size l/sum( (wl/suin(wl) ) ~2) introduced 
in Example 3.10. The preferable choice is then gi. The estimated sizes are 
given by 

> 4*10~6*var (xl*wl/sum(wl) )/mean(xl*wl/sum(wl) ) "2 
[1] 10332203 

> 4*10~6*var (x2*w2/sum(w2) )/mean(x2*w2/sum(w2) ) "2 

[1] 43686697 

> 4*10~6*var (x3*w3/ sum(w3) )/mean(x3*w3/sum(w3) ) "2 
[1] 352952159 

again showing the appeal of using the double exponential proposal. (Note that 
efficiency could be doubled by considering the absolute values of the simula- 
tions.) 



Exercise 3.15 

a. With a positive density g and the representation 



m>F{Z> 4.5)P(Z < 4.5)/1.510- 



,-13 



0.7510^ 




we can simulate ^i's from g to approximate m{x) with 
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b. When ^(a;) = T:{e\x) = f{x\6)TT{e)lK, then 

f{x,\e)^{e) 

and the normaHsation constant is the exact estimate. If the normahsing 
constant is unknown, we must use instead the self-normahsing version 
(3.7). 

c. Since 

m , . . ^ f t{9) f{x\0)7rie) ^ ^ 
f{x\e)7r{0) ^ ' ^ f{x\eMe) m{x) m{x) ' 

we have an unbiased estimator of l/fn{x) based on simulations from the 
posterior, 

T f{x\e*)7r{e*) 

and hence a converging (if biased) estimator of 'm{x). This estimator of 
the marginal density can then be seen as an harmonic mean estimator, 
but also as an importance sampling estimator (Robert and Marin, 2010). 

Exercise 3.17 

Warning: There is a typo in question b, which should read 

Let X\Y = y - ^^(1, y) and Y ~ £3:^(1). 

a. If (Xi, Yi) ~ fxY{x, y), the Strong Law of Large Numbers tells us that 

^ \^ fxY{x*,yi)w{x^) f f fxY{x*,y)w(x) 

lim-> ^ = / / ^ fxY{x,y)dxdy. 

" '^jli JXY[Xt,yi) J J fxY[x,y) 

Now cancel fxY{x, y) and use that fact that / w{x)dx = 1 to show 

/ / f-f*:y^'"l''b xYix,y)dxay ^ I fxY{x\y)dy = fx{x*). 
J J jxY(x,y) J 

b. The exact marginal is 



J [ye-y^] e-ydy = J ; 



7(2) 



(1+X)2- 

We tried the following R version of Monte Carlo marginalization: 

X=rep(0,nsim) 
Y=rep(0,nsim) 
for (i in l:nsim){ 
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Y[i]=rexp(l) 

X [i] =rgainma(l , 1 , rate=Y [i] ) 
> 



MCMarg=f unction (x , X , Y) { 

return (meaoi ( (dgammaCx, 1 ,rate=Y) /dgammaCX, 1 , 
rate=Y) ) *dgamma (X , 7 , rate=3) ) ) 

} 

True=f unction (x) (1+x) " (-2) 

which uses a Ga{7, 3) distribution to marginahze. It works ok, as you can 
check by looking at the plot 

> xplot=seq(0,5, .05) ;plot(xplot,MCMarg(xplot,X,Y)-True(xplot)) 

c. Choosing w{x) = fx{x) leads to the estimator 



n 

n. ^ 



i=l 



fxYix*,yi)fx{xi 
fxY{xi,yi) 



1 " 



i=l 



fx{.X*)fY\x{yt\x*)fx{Xi) 
fx{Xi)fY\x{yi\Xi) 



^ ^^n^^fY\xiyi\xi) 

which produces fx{x*) modulo an estimate of 1. If we decompose the 
variance of the estimator in terms of 



var < E 



fxY{x*,yi)w{xi) 



fxY{xi,yi) 
the first term is 



Xi 



E<^ var 



fxY{x*,yi)w{xi) 



fxY{xi,yi) 



E 



fxY{x*,yi)w{x^) 
fxY{xi,yi) 



= fx{x*)^ 
= fx{x*) 



fY\x{yi\x*) 
fY\x{yi\xi) 

w{Xi) 



w{Xi) 
fx{Xi) 



fx{Xi) 



which has zero variance if u){x) = fx{x). If we apply a variation calculus 
argument to the whole quantity, we end up with 



w{x) oc fx{x) 



/y|x(2/l^*) 

fY\x(.y\x) 



dy 



minimizing the variance of the resulting estimator. So it is likely fx is not 
optimal... 
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Exercise 4.1 

a. Since 

TTi{e\x) = ni{e)/ci and TX2{0\X) = ^2{0)/c2 , 

where only tti and W2 are known and where Ci and C2 correspond to the 
marginal likelihoods, mi (a:) and 7712(2;) (the dependence on x is removed 
for simplification purposes), we have that 

^ m2{x) U,^TT2{e).f2{x\e)de Je, f^2{e) m2(x) ' 

and therefore #i (0)/#2 (^) is an unbiased estimator of g when 9 ^ tt2{9\x). 

b. Quite similarly, 

/7fi(6')Q(6')7r2(6i|a:)d6' _ J 7ri{e)a{e)Tt2{e)/c2de _ ci _ 
J 7r2{9)a{e)ni{e\x)d9 ~ J n2{6)a{6)T:i{6) / cxd9 ~~^2~^' 

Exercise 4.3 

We have 

ESSn = 




(This is also a consequence of Jensen's inequality when considering that the 
sum up to one.) Moreover, the last equality shows that 

with equality if and only if a single is different from zero. 
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Exercise 4.5 

Warning: There is a slight typo in the above in that should not 
be in bold. It should thus read 

Establish that 




cov{Xk, Xk') = (T^/max{fc. k'} 



Since the X^'s are iid, for k' < k, we have 



cov{Xk, Xk') = cov I 

1=1 i=l 



i=l i=l 
\i=l i=l j 



kk 

__ 1 
~ kk' 
= <j^/k 

= I maxj/c, fc'} . 

Exercise 4.9 

Warning: There is a missing variance term in this exercise, which 
should read 

Show that 



E[exp-X2|y] = / exp{-a;2} e^^{-[x - y.fyl2a''}Ax 



^ V2a2/y + 1 '''''' I l + 2aVy 

by completing the square in the exponent to evaluate the integral. 
We have 

2x^ + {x- fify/2a^ = x^{2 + ya^"^) - 2xfiya^^ + i-i^ya^^ 

= {2 + ya-^) [x - [iyu-^l{2 + yu-^)\ ' + 
[ya-^-y'a-^/{2 + ya-^)] 

= (2 + [x - + 2aVy)] ' + + ^^Vv) 
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and thus 



/ 



exp{-x^} exp{-(a; - iify/2a'^} ^^^^^jy 



exp 



1 + 2a2/y 



j exp{-(2 + 2/a-2)[x-M/(l + 2aVy)]V2} '^'^ 



exp<^-- 



= exp 



1 1 



l + 2a2/j/J y'l + 2c72/t/ 



Exercise 4.11 



Since H{U) and H{lu) take opposite values when if is monotone, i.e. one 
is large when the other is small, those two random variables are negatively 
correlated. 



Exercise 4.13 

Warning: Another reference problem in this exercise: Exercise 4.2 
should be Exercise 4.1. 

a. The ratio (4.9) is a ratio of convergent estimators of the numerator and 
the denominator in question b of Exercise 4.1 when 9^ ~ 'it\{9\x) and 
62% ~ '!^2{9\x). (Note that the wording of this question is vague in that it 
does not indicate the dependence on x.) 

b. If we consider the special choice a{9) = 1 / tt\{9)tt2{9) in the representation 
of question b of Exercise 4.1, we do obtain q = E'^2[#2(6')"^]/E''i 
assuming both expectations exist. Given that (i = 1,2) 

E-[f,(0)-i]= / J-^d9, 
Je ■Ki{9) mi{x) 

this implies that the space O must have a finite measure. If d^ represents 
the dominating measure, Q is necessarily compact. 

Exercise 4.15 

Each of those R programs compare the range of the Monte Carlo estimates 
with and without Rao BlackwcUization: 

a. For the negative binomial mean, E/(X) = a/6 since X ~ Afeg{a, b/ (6+1)). 
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y=inatrix(rgaimna(100*Nsim,a) /b,ncol=100) 
x=matrix(rpois(100*Nsim,y) ,ncol=100) 

matplot (apply (x, 2, cumsum) / (1 :Nsim) ,type="l" , col="grey80" , 
lty=l , ylim=c ( . 4*a/b , 2*a/b) , xlab=" " , ylab=" " ) 
matplot (apply (y, 2, cumsum)/ (1 :Nsim) ,type="l" , col="grey40" , 
lty=l , add=T , xlab= " " , ylab= " " ) 
abline (h=a/b , col="gold" , lty=2 , lwd=2) 

b. For the generalized t variable, Ef{X) — BEf[X\Y] = 0. So the improve- 
ment is obvious. To make a more sensible comparison, we consider instead 
Ef[X^] = E[Y] = a/b. 

y=matrix(rgcmfflia(100*Nsim,a) /b,ncol=100) 
x=matrix(rnorm(100*Nsim, sd=sqrt (y) ) ,ncol=100) 
matplot (apply(x~2,2,cumsum)/(l:Nsim) ,type="l" , col="grey80" , 
lty=l,ylim=(a/b)*c(.2,2) , xlab=" " ,ylab=" " ) 
matplot (apply (y ,2, cumsum) / (1 :Nsim) ,type="l" , col="grey40" , 
lty=l , add=T , xlab= " " , ylab= " " ) 
abline (h=a/b , col="gold" , lty=2 , lwd=2) 

c. Warning: There is a typo in this question with a missing n in 
the Bin(y) distribution... It should be 

^|c. X\y ~ Bin{n,y), Y ~ Be{a,b) {X is beta-binomial)] 

In this case, ]E/[X] ~ nEf[Y] — na/{a + b). 

y=l/matrix(l+rgamma(100*Nsim,b)/rgamma(100*Nsim, a) ,ncol=100) 
x=matrix(rbinom(100*Nsim,n,prob=y) ,ncol=100) 

matplot (apply (x, 2, cumsum) / (1 :Nsim) ,type="l" , col="grey80" , lty=l , 
ylim=(n*a/(a+b))*c(.2,2) , xlab=" " ,ylab=" " ) 

matplot (n*apply (y , 2 , cumsum) / (1 : Ns im) , type= " 1 " , col= " grey40 " , lty=l , add=T , 
xlab="",ylab="") 

abline (h=n*a/(a+b) ,col="gold" ,lty=2,lwd=2) 



Exercise 4.17 

It should be clear from display (4.5) that we only need to delete the ri^ (fc^ 
in the current notation). We replace it with 2fc^ and add the last row and 
column as in (4.5). 



Exercise 4.19 



a. For the accept-reject algorithm. 
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{X,,...,X^)^ fix) 
(Ui,. . . , Un) ^0,1] 
{Yu...,Yr,)'-'^^- g{y) 

f(Y- ) 

and the acceptance weights are the Wj = N is the stopping time 

associated with these variables, that is, — X^- We have 

Pi = P{Ui<Wi\N = n,Y^,...,Yr,) 
^ P{Ui<Wi,N = n,Yi,...,Yr,) 
P{N = n,Yi,...,Yr,) 

where the numerator is the probability that Ijv is accepted as Xm, Yi is 
accepted as one Xj and there are (m — 2) Xj 's that are chosen from the 
remaining (n — 2) Y^'s. Since 

P {Yj is accepted) = P {Uj < wj) = Wj , 

the numerator is 

m-2 n-2 

n "^i^ n 

(il,...,Sm-2) J = l j=m-l 

where 

i) rij^^''^«, the probability that among the TV in addition to 
both Y/v and Yi being accepted, there are (m — 2) other Yj 's accepted 
as Xf's; 

ii) nj=m-i(l ~ ■^ij) probability that there are (n — m) rejected 
Yj 's, given that Y^ and Yat are accepted; 

iii) the sum is over all subsets of {1, . . . ,i — l,i + 1, . . . ,n) since, except 
for Yi and Yat, other (m — 2) Y^ 's are chosen uniformly from (n — 2) 
Y/s. 

Similarly the denominator 

rn—l 71 — 1 

P{N = n,Yu...,Yr,) = w, J2 U^^^Ui^-^n) 

is the probability that Yjv is accepted as X„ and (m — 1) other X^'s are 
chosen from (n — 1) Y^'s. Thus 

Pi = P(C/, <w,|7V = n,Yi,...,Y„) 

^ ^, E(»,,...,»„_,) nj"ri^^^», n"=I-i(i - m,) 
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b. We have 



^1 = 1 V /i (Xi) = - V {Yj)Iu,<wj 
m ^ m ^ ' 

i=l j=l 

N N 

^2 = - E ^ I iv, n, . . . , Fiv) (iso = - E (^') 

j=l i=l 

Since E(E {X\Y))=E{X), 

E (^2) = E ^1 (%^<^. |iV, Fi, . . . , r^v) j 
1 ^ 

Under quadratic loss, the risk of and ^2 are: 

R{Si) =E{5i-Eh{X)f 

= E (51) + E {E{h{X))f - 2E (5iE(ft(X))) 

= var {5i) - {E{Si)f + E (E {h{X))f - 2E (5iE 



and 



R{S2) =E{S2-Eh{X)f 



= E (Jf) + E {E{h{X))f - 2E {62E{h{X))) 

= var (Js) - iE{S2)f + E (E - 2E {S2E {h{X))) 

Since E (Ji) = E (^2), we only need to compare var (^i) and var {62)- Prom 
the definition of 5i and ^2, we have 

52{X)=E{5^{X)\Y) 

so 

var (E (Si)) = var (^2) < var (<5i) . 

Exercise 4.21 

a. Let us transform 3 into 3 = J ^^^fa)^^ 'm,{y)dy, where m is the marginal 
density of li. We have 
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= E 

neN 



P(iV = n) 
rn(y) 



E 



h{y)f{y) 
m{y\N = n) 



\N 



b. As j3 is constant, for every function c, 

3 = /3E[c(r)]+E ^^^{y ^ - 

c. The variance associated with an empirical mean of the 

h{Yi)f{Yi) 



m{Yi) 



IS 



varp) = I3\bx{c{Y)) + var 



( h{Y)f{Y) \ 



2/?cov 



m(y) 



AY) 



= (}\&r{c{Y)) - 2/3cov[rf(y),c(F)] + var(d(y)). 
Thus, the optimal choice of /3 is such that 

avar(5) 



dp 



= 



and is given by 



cov[d{Y),c{Y)] 
var(c(F)) ' 



d. The first choice of c is c{y) = II{y>j/o}, which is interesting when p = 
P{Y > 2/o) is known. In this case, 



Jy>vo 



v>yo 



m)2 



Thus, P* can be estimated using the Acccpt-rcjcct sample. A second choice 
of c is c{y) = y, which leads to the two first moments of Y. When those 
two moments mi and m2 are known or can be well approximated, the 
optimal choice of /3 is 



13* 



!yKy)f{y)dy-'^mx 



m2 



and can be estimated using the same sample or another instrumental 
density namely when y = J yh{y)f{y)dy is simple to compute, compared 
to 3. 
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Monte Carlo Optimization 



Exercise 5.1 

This is straiglitforward in R 

par (mf row=c (1,2), mar=c (4,4,1,1)) 

image (mul ,mu2 , -Hi ,xlab=expression(mu [1] ) ,ylab=expression(mu[2] ) ) 

contour (mul ,mu2 , -Hi , nle=100 , add=T) 

Nobs=400 

da=rnorm(Nobs) +2 . 5*sample (0:1, Nobs , rep=T , prob=c (1,3)) 
for (i in 1:250) 
for (j in 1:250) 

lli[i,j]=like(c(mul[i] ,mu2[j])) 
image (mul ,mu2 , -Hi , xlab=expression (mu [1] ) , ylab=expression (mu [2] ) ) 
contour (mul ,mu2 , -Hi ,nle=100, add=T) 

Figure 5.1 shows that the log-likelihood surfaces are quite comparable, despite 
being based on different samples. Therefore the impact of allocating 100 and 
300 points to both components, respectively, instead of the random 79 and 
321 in the current realisation, is inconsequential. 

Exercise 5.3 

Warning: as written, this problem has not simple solution! The 
constraint should be replaced with 



1^(1 + sin(y/3) cos(8a;)) + y^{2 + cos(5a;) cos{iy)) < 1 , 

We need to find a lower bound on the function of {x,y). The coefhcient of 
is obviously bounded from below by 1, while the coefhcient of is positive. 
Since the function is bounded from below by y^, this means that y'^ < 1, 
hence that sin(y/3) > sin(— 1/3) > —.33. Therefore, a lower bound on the 
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Fig. 5.1. Comparison of two log-likelihood surfaces for the mixture model (5.2) 
when the data is simulated with a fixed 100/300 ratio in both components (left) and 
when the data is simulated with a binomial S(400, 1/4) random number of points 
on the first component. 

function is 0.77x^ +2/^- If we simulate uniformly over the ellipse 0.77x^ +y'^ < 
1, we can subsample the points that satisfy the constraint. Simulating the 
uniform distribution on 0.77x^ + < 1 is equivalent to simulate the uniform 
distribution over the unit circle + < 1 and resizing z into x — z 

theta=runif (10'5) *2*pi 
rho=runif (10~5) 
xunif =rho*cos (theta) / . 77 
yunif =rho*sin(tlieta) 

plot (xunif , yunif , pch=19 , cex= . 4 , xlab="x" , ylab="y " ) 
const= (xunif ~2* (1+sin (yunif /3) *cos (xunif *8) ) + 

yunif ~2* (2+cos (5*xunif ) *cos (8*yunif ) ) <1) 
points (xunif [const] , yunif [const] , col="cornsilk2" ,pch=19 , cex= .4) 

While the ellipse is larger than the region of interest, Figure 5.2 shows 
that it is reasonably efficient. The performances of the method are given by 
suiii(const) /10~4, which is equal to 73%. 

Exercise 5.5 

Since the log-likelihood of the mixture model in Example 5.2 has been defined 

by 

#minus the log-likelihood function 
like=f unction (mu) { 

-sum (log ( ( . 25*dnorm(da-mu[l] )+ . 75*dnorm(da-mu[2] ) ) ) ) 

> 

in the mcsm package, we can reproduce the R program of Example 5.7 with 
the function h now defined as like. The difference with the function h of 
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o 




Fig. 5.2. Simulation of a uniform distribution over a complex domain via uni- 
form simulation over a simpler encompassing domain for 10^ simulations and an 
acceptance rate of 0.73%. 



Example 5.7 is that the mixture log-likehhood is more variable and thus 
the factors aj and /3j need to be calibrated against divergent behaviours. 
The following figure shows the impact of the different choices {aj,/3j) = 
(.01/log(j + l),l/logO- + (a„/3j) = (.l/log(j + l),l/logO- + 

(a,-,/3,) = (.01/log(j + l),l/log(j + l)-i), ^ (.1/ log(j + 1), 1/ log(j + 

1)'^), on the convergence of the gradient optimization. In particular, the sec- 
ond choice exhibits a particularly striking behavior where the sequence of 
{111,112) skirts the true mode of the likelihood in a circular manner. (The 
stopping rule used in the R program is (dif f <10~ (-5) ) .) 

Exercise 5.7 



The R function SA provided in Example 5.9 can be used in the following R 
program to test whether or not the final value is closer to the main mode or 
to the secondy mode: 

modes=matrix(0,ncol=2 ,nrow=100) 
prox=rep(0, 100) 
for (t in 1:100){ 
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Fig. 5.3. Four stochastic gradient paths for four different choices {aj,/3j) = 
(.01/log(i + l),l/log(j + ly") (u.Lh.s.), (a,, 13,) = (.f/log(j + l),l/log(j + l) '^) 
(u.r.h.s.), = (.Of/log(i + f),f/log(j + l)-i) (l.Lh.s.), = (.l/log(i + 

l),l/log(j + l) i) (Lr.h.s.). 



res=SA(mean(da)+rnorm(2) ) 
modes [t , ] =res$the [res$ite , ] 
dif f =modes [t , ] -c (0 , 2 . 5) 
duf f =modes [t , ] -c (2 . 5 , 0) 

prox [t] =sum(t (dif f )7.*y.dif f <t (duf f )7,*y.duf f ) 
} 

For each new temperature schedule, the function SA must be modified ac- 
cordingly (for instance by the on-line change SA=vi (SA) ) . Figure 5.4 illustrates 
the output of an experiment for four different schedules. 
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Fig. 5.4. Four simulated annealing outcomes corresponding to the temperature 
schedules Tt = l/llog(l + t), Tt = l/101og(l + t), Tt = l/10yiog(TTt), and 
Tt — (.95)^^*, based on 100 replications. (The percentage of recoveries of the main 
mode is indicated in the title of each graph.) 

Exercise 5.9 

In principle, Q{9'\9,x) should also involve the logarithms of 1/4 and 1/3, 
raised to the powers J2 ^^id ^ respectively. But, due to the loga- 
rithmic transform, the expression does not involve the parameter = {^1,1^2) 
and can thus be removed from (5(6''|6',x) with no impact on the optimization 
problem. 

Exercise 5.11 

Warning: there is a typo in Example 5.16. The EM sequence should 
be 
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instead of having .T4 in the denominator. 

Note first that some 1/4 factors have been removed from every term as they 
were not contributing to the likelihood maximisation. Given a starting point 
^0, the EM sequence will always be the same. 

x=c(58,12,9,13) 
n=suin(x) 

start=EM=cur=dif f = . 1 

while (diff>.001){ #stopping rule 

EM=c (EM , ( (cur*x [1] / (2+cur) ) +x [4] ) / ( (cur*x [1] / (2+cur) ) +x [2] +x [3] +x [4] ) ) 

dif f =abs (cur-EM [length(EM) ] ) 

cur=EM [length (EM) ] 

} 

The Monte Carlo EM version creates a sequence based on a binomial simula- 
tion: 

M=10-2 

MCEM=matrix (start , ncol=length (EM) , nrow=500) 
for (i in 2 : length (EM) ){ 

MCEM [ , i] =1/ ( 1+ (x [2] +x [3] ) / (x [4] +rbinom(500 , M*x [1] , 

prob=l/ (1+2/MCEM [ , i-1] ) ) /M) ) 

} 

plot(EM,type="l" ,xlab="iterations" ,ylab="MCEM sequences") 
upp=apply (MCEM , 2 , max) ; dow=apply (MCEM , 2 , min) 

polygon(c(l: length (EM) .length (EM) :1) , c (upp , rev (dow) ) ,col="grey78") 

lines (EM, col="gold" , lty=2 , lwd=2) 

} 

and the associated graph shows a range of values that contains the true EM 
sequence. Increasing M in the above R program obviously reduces the range. 

Exercise 5.13 

The R function for plotting the (log-)likclihood surface associated with (5.2) 
was provided in Example 5.2. We thus simply need to apply this function to 
the new sample, resulting in an output like Figure 5.5, with a single mode 
instead of the usual two modes. 

Exercise 5.15 

Warning: there is a typo in question a where the formula should 
involve capital Zj's, namely 
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^1 

Fig. 5.5. Log-likeliliood surface of a mixture model applied to a five component 
mixture sample of size 400. 




a. The likelihood is 



12 

i=l 

and the complete-data likelihood is 

12 

L%e\^, z) = n [pAe-^-'I(,,=i) + (1 - p)Me-^^'I(,^^2)], 
1=1 

where 9 — (p, A, /i) denotes the parameter, using the same arguments as 
in Exercise 5.14. 

b. The EM algorithm relies on the optimization of the expected log-likelihood 
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The arguments of the maximization problem are 

=P/12 

A(i+1) = "52/-?, 



where 



Si = TZi^iPe,SZi = l\xi) 



{S2 = TZi^iPe,AZi = 'i\xi) 



with 



p. (Z. = l\x) = ^0-)^0)^"'"'"^ 

' ' ^(i)Aa)e-^")^^ + (l-PO))/iO)e-^0)-^ 

An R implementation of the algorithm is then 

x=c (0.12, 0.17, 0.32, 0.56, 0.98, 1-03, 1-10, 1-18, 1-23, 1.67, 1.68, 2. 33) 

EM=cur=c(. 5, jitter (mean (x) ,10) .jitter (mean (x) ,10)) 

diff=l 

while (diff*10"5>l){ 



probs=l/ (1+ (1-cur [1] ) *dexp (x , cur [3] ) / (cur [1] *dexp (x , cur [2] ) ) ) 

phat=sum(probs) ;Sl=sum(x*probs) ;S2=sum(x*(l-probs)) 

EM=rb ind ( EM , c ( phat / 1 2 , S 1 /phat , S 2 /phat ) ) 

diff=sum(abs(cur-EM[dim(EM) [1] ,])) 

cur=EM[dim(EM) [1] ,] 

} 

and it always produces a single component mixture. 
Exercise 5.17 



Warning: Given the notations of Example 5.14, the function (j) in 
question b should be written (f... 

a. The question is a bit vague in that the density of the missing data 
{Zn-m+l^ • • • , ^n) IS a normal M{9, 1) density if we do not condition on 
y. Conditional upon y, the missing observations Zi arc truncated in a, 
i.e. we know that they are larger than a. The conditional distribution of 
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the Zi's is therefore a normal J^{0, 1) distribution truncated in a, with 
density 

X exp{-fe - 6')V2} ^ ip(z-9) ^ 
f(z\e, y) = ^-^—^ lz> a.= '— Iz> a. 

where ip and are the normal pdf and cdf, respectively, 
b. We have 



<fi{a - 6) 



since ip'{x) = —xip{x). 
Exercise 5.19 

Running uniroot on both intervals 

> h=f unction(x) { (x-3) * (x+6) * ( 1+sin (60*x) ) > 

> uniroot (h,int=c (-2, 10)) 
$root 

[1] 2.999996 

$f .root 

[1] -6.853102e-06 

> uniroot (h, int=c (-8, 1) ) 
$root 

[1] -5.999977 
$f . root 

[1] -8.463209e-06 

misses all solutions to 1 + sin(60a;) = 



Exercise 5.21 



Warning: this Exercise duplicates Exercise 5.11 and should not have 
been included in the book! 
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Metropolis-Hastings Algorithms 



Exercise 6.1 

A simple R program to simulate this chain is 

# (C.) Jiazi Tang, 2009 

x=l:10~4 

x[l]=rnonn(l) 

r=0.9 

for (i in 2:10'4)-C 

x[i]=r*x[i-l]+rnorm(l) ]■ 
hist (x , f req=F , col="wheat2 " ,main=" " ) 
curve (dnorm(x,sd.=l/sqrt (l-r~2) ) ,add=T, col=" tomato" 

Exercise 6.3 

When q{y\x) = g{y), we have 




Since the acceptance probability satisfies 



f{y) gjx) 
f{x) g(y) 



> 



max f(x) / g{x) 



.f{y)/9{y) 



it is larger for Metropolis-Hastings than for accept-reject. 
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Exercise 6.5 

a. The first property follows from a standard property of the normal dis- 
tribution, namely that the linear transform of a normal is again normal. 
The second one is a consequence of the decomposition y = Xj3 -\- e, when 
e ~ Afn{0, (y^In) is independent from X/3. 

b. This derivation is detailed in Marin and Robert (2007, Chapter 3, Exercise 
3.9). 

Since 

y|a^ X ~ Nn{Xp, cj\lr, + nX{X^X)-^X^)) , 
integrating in cr^ with 7r(cr^) = yields 



f{y\X) = (n+ l)-('=+i)/27r-"/2r(n/2) 

-n/2 

0''X'^ XP ' 



1 



Using the R function dmt (mnormt ) , we obtain the marginal density for 
the Swiss dataset: 

> y=log(as .vector (swiss [, 1] ) ) 

> X=as.matrix(swiss[,2:6]) 

> library (mnormt) 

> dmt(y,S=diag(length(y))+xy.*%solve(t(X)'/.*y,X)%*7,t(X) ,d=length(y)-l) 

[1] 2.096078e-63 

with the prior value /3 = 0. 
Exercise 6.7 

a. Wc generate an Metropolis-Hastings sample from the Se(2.7, 6.3) density 
using uniform simulations: 

# (C.) Thomas Bredillet, 2009 

Nsim=10'4 

a=2.7;b=6.3 

X=runif (Nsim) 

last=X[l] 

for (i in l:Nsim) { 

cand=rbeta(l ,1,1) 

alpha=(dbeta(ccind, a,b) /dbetaClast , a,b) )/ 
(dbetaCcand, 1 , l)/dbeta(last ,1,1)) 
if (runif (l)<alpha) 

last=Ccind 
X[i]=last 

} 

hist (X,pro=TRUE,col="wheat2" ,xlab="" ,ylab="" ,main="Beta(2.7,3) simulation") 
curve (dbeta (x , a , b) , add=T , lwd=2 , col=" sienna2 " ) 
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The acceptance rate is estimated by 

> length (unique (X))/5000 
[1] 0.458 

If instead we use a Be{20, 60) proposal, the modified lines in the R program 

are 

caiid=rbeta (20 ,60,1) 

alpha=(dbeta(caiid, a,b)/dbeta(last , a,b) ) / 

(dbeta(cand , 20 , 60) /dbeta(last , 20 , 60) ) 

and the acceptance rate drops to zero! 
b. In the case of a truncated beta, the following R program 

Nsim=5000 

a=2 . 7 ; b=6 . 3 ; c=0 . 25 ; d=0 . 75 
X=rep(runif (1) ,Nsim) 
test2=f unction(){ 
last=X[l] 

for (i in l:Nsim){ 

caiid=rbet a ( 1 , 2 , 6 ) 

alpha=(dbeta(cand, a,b) /dbetaClast , a,b) )/ 
(dbeta(cand , 2 , 6) /dbeta(last ,2,6)) 
if ((runif (l)<alpha)&&(cand<d)&&(c<caiid)) 

last=cand 
X[i]=last} 

} 

testl=f unction(){ 
last=X[l] 

for (i in l:Nsiin)-C 

cand=runif ( 1 , c , d) 

alpha=(dbeta(cEind,a,b)/dbeta(last , a,b) ) 
if ((runif (l)<alpha)&&(cand<d)&&(c<caiid)) 

last=cand 
X[i]=last 
> 

} 

systein.time(testl () ) ; system. time (test2()) 

shows very similar running times but more efiiciency for the beta proposal, 
since the acceptance rates are approximated by 0.51 and 0.72 for testl 
and test2, respectively. When changing to c = 0.25, d = 0.75, testl is 
more efiicient than test2, with acceptances rates of approximately 0.58 
and 0.41, respectively. 

Exercise 6.9 

a. The Accept Reject algorithm with a Gamma Q{A, 7) candidate can be 
implemented as follows 
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# (C.) Jiazi Tang, 2009 
g47=rgainma (5000 ,4,7) 

u=runif (5000 , max=dgamma (g47 ,4,7)) 

x=g47 [u<dgamina (g47 ,4.3,6.2)] 

par (mf row=c (1,3), mar=c (4,4,1,1)) 

hist (x , f req=FALSE , xlab= " " , ylab= " " , col= " wheat 2 " , 

inain=" Accept -Reject with Ga(4.7) proposal") 

curve (dgamma (x , 4 . 3 , 6 . 2) , lwd=2 , col=" sienna" , add=T) 

The efficiency of the simulation is given by 

> length (x)/5000 
[1] 0.8374 

b. The MetropoUs-Hastings algorithm with a Gamma t/(4, 7) candidate can 
be implemented as follows 

# (C.) Jiazi Tang, 2009 
X=rep (0,5000) 
X[l]=rgamma(l,4.3,6.2) 
for (t in 2: 5000) { 

rho= (dgamma (X[t-1] ,4,7)*dgamma(g47[t] ,4.3,6.2))/ 

(dgamma (g47 [t] ,4,7)*dgamma(X[t-l] ,4.3,6.2)) 
X [t] =X [t-1] + (g47 [t] -X [t-1] ) * (runif (1) <rho) 
} 

hist (X,f req=FALSE,xlab=" " ,ylab=" " , col="wheat2" , 
main="Metropolis-Hastings with Ga(4,7) proposal") 
curve (dgamma (x , 4 . 3 , 6 . 2) , lwd=2 , col=" sienna" , add=T) 

Its efficiency is 

> length(unique(X))/5000 
[1] 0.79 

c. The Metropolis-Hastings algorithm with a Gamma G{5,6) candidate can 
be implemented as follows 

# (C.) Jiazi Tang, 2009 
g56=rgamma (5000 ,5,6) 
X[l]=rgainma(l,4.3,6.2) 
for (t in 2: 5000) { 

rho=(dgamma(X[t-l] ,5,6)*dgamma(g56[t] ,4.3,6.2))/ 

(dgamma (g56 [t] ,5,6)*dgamma(X[t-l] ,4.3,6.2)) 
X [t] =X [t-1] + (g56 [t] -X [t-1] ) * (runif (1) <rho) 
> 

hist (X , f req=FALSE , xlab=" " , ylab=" " , col=" wheat2 " , 
main="Metropolis-Hastings with Ga(5,6) proposal") 
curve (dgamma (x , 4 . 3 , 6 . 2) , lwd=2 , col=" sienna" , add=T) 

Its efficiency is 



6 Metropolis-Hastings Algorithms 49 



> length (unique (X))/5000 
[1] 0.7678 

which is therefore quite similar to the previous proposal. 
Exercise 6.11 

1.. Using the candidate given in Example 6.3 mean using the Braking R 

program of our package mcsm. In the earlier version, there is a missing 
link in the R function which must then be corrected by changing 

data=read.. table (" BrakingData. txt ", Sep = "",header=T) 
x=data[, 1] 
y=data [ , 2] 

into 

x=cars [, 1] 
y=cars [,2] 

In addition, since the original Braking function does not return the sim- 
ulated chains, a final line 

list (a=blhat ,b=b2hat , c=b3hat , sig=s2hat) 

must be added into the function. 

2.. If we save the chains as mcmc=Braking() (note that wc use 10'^ simulations 
instead of 500), the graphs assessing convergence can be plotted by 

par(mfrow=c(3,3) ,mar=c(4,4,2,l)) 
plot (incinc$a,type="l" ,xlab=" " ,ylab="a") ;acf (mciiicla) 
hist (mcmc$a,prob=T,main=" " ,yla=" " ,xla="a" , col="wheat2") 
plot (incinc$b , type= " 1 " , xlab= " " , ylab= "b " ) ; acf (incmc$b) 
hist (incinc$b,prob=T,inain=" " ,yla=" " ,xla="b" , col="wheat2") 
plot (mcmc$c,type="l" ,xlab=" " ,ylab="c") ; acf (mcmc$c) 
hist (mcmc$c,prob=T,main="" ,yla=" " ,xla="c" , col="wheat2") 

Autocorrelation graphs provided by acf show a strong correlation across 
iterations, while the raw plot of the sequences show poor acceptance rates. 
The histograms are clearly unstable as well. This 10^ iterations do not 
appear to be sufficient in this case. 
3.. Using 

> quantile(mcmc$a,c(.025, .975)) 

2 . 57. 97 . 57. 
-6.462483 12.511916 

and the same for b and c provides converging confidence intervals on the 
three parameters. 
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Warning: There is a typo in question b in that the candidate must 
also be a double-exponential for a, since there is no reason for a to 
be positive... 

1. The dataset challenger is provided with the mcsm package, thus available 
as 

> library (mcsm) 

> data (challenger) 

Running a regular logistic regression is a simple call to glm: 

> temper=challenger [, 2] 

> f ailur=challenger [, 1] 

> summary (glm(failur~temper, family = binomial)) 

Deviance Residuals: 

Min IQ Median SQ Max 

-1.0611 -0.7613 -0.3783 0.4524 2.2175 



Coefficients : 

Estimate Std. Error z value Pr(>|z|) 
(Intercept) 15.0429 7.3786 2.039 0.0415 * 
temper -0.2322 0.1082 -2.145 0.0320 * 

Signif. codes: .001 "**" .01 .05 "." .1 "" 1 

(Dispersion peirameter for binomial family taken to be 1) 

Null devieince: 28.267 on 22 degrees of freedom 
Residual devieince: 20.315 on 21 degrees of freedom 

AIC: 24.315 

The MLE's and the associated covariance matrix are given by 

> challe=summary(glm(f ailur~temper , family = binomial)) 

> beta=as .vector (challe$coef [, 1] ) 

> challe$cov. unsealed 

(Intercept) temper 
(Intercept) 54.4441826 -0.79638547 
temper -0.7963855 0.01171512 

The result of this estimation can be checked by 

plot (temper , f ailur ,pch=19 , col="red4" , 
xlab= "temperatures " , ylab= " f allures " ) 

curve (1/ (l+exp(-beta [1] -beta[2] *x) ) , add=TRUE, col="gold2" , lwd=2) 
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and the curve shows a very clear impact of the temperature. 

2. The Metropolis- Hastings resolution is based on the challenge (mcsm) 
function, using the same prior on the coefScients, a ~ A/'(0, 25), ^ 
A/'(0, 25/s^), where is the empirical variance of the temperatures. 

Nsim=10~4 
x=temper 
y=f ailur 
siginaa=5 
sigmab=5/sd(x) 

lpost=function(a,b){ 

sum(y*(a+b*x)-log(l+exp(a+b*x) ) )+ 

dnorm(a, sd=sigmaa, log=TRUE) +dnorin(b , sd=sigmab , log=TRUE) 
> 

a=b=rep(0,Nsim) 
a[l]=beta[l] 

b[l]=beta[2] 

#scale for the proposals 
scala=sqrt(challe$cov.un[l, 1] ) 
scalb=sqrt(challe$cov.un[2,2] ) 

for (t in 2:Msiin){ 

propa=a[t-l] +Scmiple (c (-1 , 1) , 1) *rexp(l) *scala 
if (logCrunif (l))<lpost(propa,b[t-l] )- 

lpost(a[t-l] ,b[t-l])) a[t]=propa 
else a[t]=a[t-l] 

propb=b [t-1] +sample (c (-1 , 1) , 1) *rexp (1) *scalb 
if (logCrunif (l))<lpost (a [t] ,propb)- 

lpost(a[t] ,b[t-l])) b[t]=propb 
else b[t]=b[t-l] 
} 

The acceptance rate is low 

> length (unique (a) )/Nsim 
[1] 0.1031 

> length(unique(b))/Nsim 
[1] 0.1006 

but still acceptable. 

3. Exploring the output can be done via graphs as follows 

par(mf row=c(3,3) ,mar=c(4,4,2, 1)) 

plot(a,type="l" ,xlab=" iterations" ,ylab=expression(alpha) ) 
hist (a,prob=TRUE, col="wheat2" ,xlab=expression (alpha) ,inain=" ") 
acf (a,ylab=expression(alpha) ) 
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plot(b,type="l" ,xlab=" iterations" ,ylab=expression(beta) ) 
hist (b ,prob=TRUE, col="wheat2" ,xlab=expression(beta) ,main=" ") 
acf (b,ylab=expression(beta) ) 

plot (a,b,type="l" ,xlab=expression(alpha) ,ylab=expression(beta) ) 
plot (temper , f ailur ,pch=19 , col="red4" , 

xlab=" temperatures" ,ylab=" failures") 
for (t in seq(100,Nsim,le=100)) curve(l/(l+exp(-a[t]-b[t]*x)) , 

add=TRUE,col="grey65" ,lwd=2) 
curve (1/ ( 1+exp (-mean(a) -mean(b) *x) ) , add=TRUE , col="gold2" , lwd=2 . 5) 
postal=rep(0,1000) ;i=l 

for (t in seq(100,Nsim,le=1000)){ postal [i] =lpost (a [t] ,b [t] ); i=i+l} 
plot(seq(100,Nsim,le=1000) .postal ,type="l" , 

xlab= " it erat ions " , ylab= " log-posterior " ) 
abline (h=lpost (a [1] ,b [1] ) , col="sieniia" , lty=2) 

which shows a slow convergence of the algorithm (see the acf graphs on 
Figure 6.1!) 
4. The predictions of failure are given by 

> mean(l/(l+exp(-a-b*50))) 
[1] 0.6898612 

> mean(l/(l+exp(-a-b*60))) 
[1] 0.4892585 

> mean (l/( 1+exp (-a-b*70))) 
[1] 0.265691 

Exercise 6.15 

Warning: There is a typo in question c, which should involve J\f{0, u)) 
candidates instead of £(0,w)... 

a. An R program to produce the three evaluations is 

# (C.) Thomas Bredillet, 2009 

Nsim=5000 

A=B=rmif (Nsim) 

alpha=l ; alpha2=3 

last=A[l] 

a=0;b=l 

caiid=if else(runif (Nsim)>0 . 5, 1 ,-1) * rexp (Nsim) /alpha 
for (i in l:Nsim){ 

rate=(dnorm(cand[i] ,a,b"2)/dnorm(last ,a,b"2))/ 
(exp (-alpha* abs (cand [i] ) ) /exp (-alpha*abs (last) ) ) 

if (runif (l)<rate) last=caiid[i] 

A[i]=last 

} 

CcLnd=if else (runif (Nsim) >0 . 5, 1 ,-1) * rexp (Nsim) /alpha2 
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Fig. 6.1. Graphical checks of the convergence of the Metropolis-Hastings algorithm 
associated with the challenger dataset and a logistic regression model. 



for (i in l:Nsini) { 

rate=(dnorm(caiid [i] , a,b~2) /dnormdast ,a,b~2) ) / 

(exp(-alpha2*abs (ccLnd [i] ) ) / exp (-alpha2*abs (last) ) ) 

if (runif (1) <rate) last=cand[i] 

B[i]=last 

} 

par (mfrow=c(l,3) ,mar=c(4,4,2,l)) 
estl=cmnsum(A) / (1 : Nsim) 
est2=cmnsum(B)/(l :Nsim) 

plot(estl,type="l" ,xlab="iteratioiis" ,ylab=" " ,lwd=2) 
lines (est2, lwd="2" ,col="gold2") 
acf (A) 
acf (B) 



54 6 Metropolis-Hastings Algorithms 



b. The acceptance rate is given by length(unique(B))/Nsiin, equal to 0.49 
in the current simulation. A plot of the acceptance rates can be done via 

the R program 

alf=seq(l,10,le=50) 

candO=ifelse(runif (Nsiiii)>0.5,l,-1) * rexp(Nsiiii) 

acce=rep(0,50) 

for (j in 1:50){ 

cand=candO/ alf [ j ] 

last=A[l] 

for (i in 2:Nsiin)-[ 

rate=(dnorin(cand[i] ,a,b"2)/dnonn(last ,a,b"2) )/ 

(exp (-alf [j] *abs (cand [i] ) ) /exp(-alf [j] *abs (last) ) ) 

if (runif (l)<rate) last=cand[i] 

A[i]=last 

} 

acce [j] =length (unique (A) ) /Nsim 
} 

par (mf row=c (1,3), mar=c (4,4,2,1)) 

plot (alf , acce ,xlab=" " ,ylab="" ,type="l" ,main="Laplace iid") 

The highest acceptance rate is obtained for the smallest value of a. 

c. The equivalent of the above R program is 

oine=sqrt (seq( .01,10, le=50) ) 

candO=rnorm(Nsim) 
acce=rep(0,50) 
for (j in 1:50){ 

cand=candO*oine [j] 

last=A[l] 

for (i in 2: Nsim) { 

rate=(dnorm(cand [i] ,a,b~2) /dnorm(last , a,b~2) ) / 

(dnorm(cEind[i] , sd=ome [j] ) /dnorm(last , sd=ome [j] ) ) 

if (runif (l)<rate) last=cand[i] 

A[i]=last 

} 

acce [j ] =length (unique (A) ) /Nsim 
} 

plot(ome~2,acce,xlab="" ,ylab="" ,type="l" ,main="Normal iid") 

The highest acceptance rate is (unsurprisingly) obtained for iv close to 1. 

d. The equivalent of the above R program is 

alf=seq(.l,10,le=50) 

Ceind0=if else (runif (Nsim) >0 . 5, 1 ,-1) * rexp(Nsim) 
acce=rep(0,50) 
for (j in 1:50){ 
eps=CcindO/alf [j] 
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last=A[l] 

for (i in 2:Nsim){ 
cand [i] =last+eps [i] 

rate=dnorm(caiid[i] ,a,b"2)/dnorin(last ,a,b"2) 
if (runif (l)<rate) last=caiid [i] 
A[i]=last 
> 

acce [j ] =length (unique (A) ) /Nsim 
} 

plot (alf, acce, xlab="" ,ylab="" ,type="l" ,inain=" Laplace random walk") 

Unsurprisingly, as a increases, so does the acceptance rate. However, given 
that this is a random walk proposal, higher acceptance rates do not mean 
better performances (see Section 6.5). 
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Gibbs Samplers 



Exercise 7.1 

The density gt of {Xt, Yj) in Algorithm 7 is decomposed as 

gt{Xt, Yt\Xt_u ...Xo, Ft-i, ...Yo)= gt,x\Y{Xt\Yt, Xt-i, ...Xq, Yt-i, • • • Fo) 
xgt,YiYt\Xt-i,...Xo,Yt-u---Yo) 

with 

.^.y^l^t-i, ... ^0, Yt_u- --Yo)^ fY\xiYt\Xt^i) 

which only depends on Xi^i, . . . Xq, it-i, ■ ■ - Yq through Xt-i, according to 
Step 1. of Algorithm 7. Moreover, 

9t,X\Y{Xt\'Yt, ^t-l, ■ ■ ■ Xq, Yt_i,. . . Yq) = fx\Y{Xt\Yt) 

only depends on Xt-2, ■ ■ ■ Xq, Yt,...Yo through Yt- Therefore, 

gt{Xu Yt\Xt_r, ...Xo, Ft-i, . . . Fq) = 9t{Xt, Yt\Xt-i) , 
which shows this is truly an homogeneous Markov chain. 

Exercise 7.5 

a. The (normal) full conditionals are defined in Example 7.4. An R program 
that implements this Gibbs sampler is 

# (C.) Anne Sabourin, 2009 
T=500 ;p=5 ;r=0.25 
X=cur=rnorm(p) 
for (t in 1 :T){ 
for (j in 1 :p){ 

m=sum(cur [-j] ) / (p-1) 

cur [j]=rnorin(l, (p-l)*r*in/(l+(p-2)*r) , 
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sqrt ( (1+ (p-2) *r- (p-1) *r-2) / (1+ (p-2) *r) ) ) 

} 

X=cbind(X, cur) 
> 

par (mf row=c (1,5)) 

for (i in l:p){ 

hist (X [i ,] ,prob=TRUE, col="wheat2" ,xlab=" " ,main=" ") 
curve (dnorm(x) ,add=TRUE,col="sieima" ,lwd=2)} 

b. Using instead 

J=matrix(l ,ncol=5,nrow=5) 
I=diag(c(l,l,l,l,l)) 
s=(l-r)*I+r*J 
nnnorin(500,s) 

and checking the duration by system. time shows rmnorm is about five 
times faster (and exact!). 

c. If we consider the constraint 

m p 

it imposes a truncated normal full conditional on all components. Indeed, 
for 1 < i < m, 

p m 

^» - E ^ E ^3 ' 

while, for i > m, 

p m 

E ^l-E^?- 

j=m+l,j/i j = l 

Note that the upper bound on when i < m cannot be negative if we 
start the Markov chain under the constraint. The cur [j] =rnorm( . . . line 
in the above R program thus needs to be modified into a truncated normal 
distribution. An alternative is to use a hybrid solution (see Section 7.6.3 
for the validation): we keep generating the a;,'s from the same plain normal 
full conditionals as before and we only change the components for which 
the constraint remains valid, i.e. 

for (j in l:m)-C 

mea=sum(cur [-j] )/(p-l) 

prop=rnorm ( 1 , (p-l)*r*mea/ (l+(p-2)*r) , 

sqrt ( ( 1+ (p-2) *r- (p-1) *r-2) / (1+ (p-2) *r) ) ) 
if (sum(cur[(l:m) [-j]] "2+prop"2)<sum(cur [(m+1) :p] "2)) 
cur [j]=prop 

} 
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for (j in (m+l):p){ 

mea=sum(cur [-j] ) / (p-1) 

prop=rnorm ( 1 , (p-l)*r*mea/ (l+(p-2)*r) , 

sqrt ( (1+ (p-2) *r- (p-1) *r-2) / (1+ (p-2) *r) ) ) 
if (sum(cur[(l:m)]"2)<suin(cur[((m+l) :p) [-j] ] "2+prop~2) ) 
cur [j] =prop 

> 

Comparing the histograms with the normal jV(0, 1) shows that the marginals 
are no longer normal. 

Exercise 7.7 

Warning: There is a typo in Example 7.6, namely that the likeli- 
hood function involves <P(d — a)"~"^ in front of the product of normal 
densities... For coherence with Examples 5.13 and 5.14, in both Ex- 
ample 7.6 and Exercise 7.7, x should be written y, z z, x y and Xi 

a. The complete data likelihood is associated with the distribution of the 
uncensored data 

(2/1 ) • * • ) •2'm+l 5 • • * ? ^n) j 

which constitutes an iid sample of size n. In that case, a sufficient statistics 
is {my + (n — m{z)}/n, which is distributed as Af{9, 1/n), i.e. associated 
with the likelihood 

In this sense, the likelihood is proportional to the density of ~ J\f{{mx + 
(n — m)z}/n, 1/n). (We acknowledge a certain vagueness in the wording 

of this question!) 

b. The full R code for the Gibbs sampler is 

xdata=c (3. 64, 2. 78, 2. 91, 2. 85, 2. 54, 2. 62, 3. 16, 2. 21, 4. 05, 2. 19, 
2.97,4.32,3.56,3.39,3.59,4.13,4.21,1.68,3.88,4.33) 

m=length (xdata) 

n=30;a=3.5 #1/3 missing data 

nsini=10"4 

xbar=meaii (xdata) 

that=array(xbar ,dim=c(nsiiii, 1) ) 

zbar =arr ay (a,diin=c(nsim,l)) 

for (i in 2:nsim){ 

teinp=runif (n-m,min=pnorin(a,niean=that [i-1] ,sd=l) ,max=l) 
zbar [i]=mean(qnonn (temp, mean=that [i-1] ,sd=l)) 
that [i] =rnorm(l ,mecai=(m*xbar+(n-m) *zbar [i] )/n, 
sd=sqrt(l/n)) 
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} 

par (mf row=c (1 , 2) ,mar=c (5,5,2,1)) 

hist (that [500:nsiin] ,col="grey" ,breaks=25, 

xlab=expression(theta) ,main=" " ,freq=FALSE) 

curve (dnorm(x,mean(that) , sd=sd(that) ) ,add=T,lwd=2) 

hist (zbar [500 :nsiin] , col="grey" ,breaks=25 

inain="" ,xlab= expressionCbar (Z) ) ,freq=FALSE) 

curve (dnorin(x, mean (zbar) , sd=sd(zbar) ) ,add=T,lwd=2) 

(Wc added the normal density curves to check how close to a normal 
distribution the posteriors are.) 

Exercise 7.9 



a. Given the information provided in Table 7.1, since we can reasonably as- 
sume independence between the individuals, the distribution of the blood 
groups is a multinomial distribution whose density is clearly proportional 
to 

{p\ + 2pAPor^ iPB + ^PBPoT^ (paPbT^^ (pIT" ■ 
the proportionality coefficient being the multinomial coefficient 

( " )• 

\nA ub riAB no J 

b. If we break nA into Za individuals with genotype AA and ua — Za with 
genotype AO, and similarly, rig into Zb individuals with genotype BB and 

Ub — Zb with genotype BO, the complete data likelihood corresponds to 
the extended multinomial model with likelihood proportional to 

c. The Gibbs sampler we used to estimate this model is 

nsim=5000;nA=186;nB=38;nAB=13;n0=284; 

pA=array( . 25,dim=c(nsiin, 1) ) ;pB=array ( . 05,diin=c(nsim, 1) ) ; 
for (i in 2:nsiin){ 
pO=l-pA[i-l]-pB[i-l] 

ZA=rbinom(l ,nA,pA [i-1] "2/ (pA [i-1] -2+2*pA [i-1] *pO) ) ; 
ZB=rbinoin(l ,nB,pB [i-1] "2/ (pB [i-1] -2+2*pB [i-1] *pO) ) ; 
temp=rdirichlet (1 , c (nA+nAB+ZA+1 ,nB+nAB+ZB+l , 

nA-ZA+nB-ZB+2*nO+l)) ; 
pA [i] =teinp [1] ; pB [i] =teinp [2] ; 
} 

par(mf row=c(l ,3) ,mar=c(4,4,2, 1)) 

hist (pA,main=expression(p [A] ) ,f req=F, col="wheat2") 
hist (pB,main=expression(p [B] ) ,f req=F, col="wheat2") 
hist(l-pA-pB, ,main=expression(p [0] ) ,freq=F,col="wheat2") 
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It uses the Dirichlet generator rdirichlet found in the mcsm package. 
Exercise 7.11 

a. For the target density fx{x) = |e~^, a shce samphng algorithm is based 
on the full conditionals 

a) C/(*+i)^^o./x (.('))] 

b) ~ with = {x,f{x) > 

Therefore, U\x ~ W(0, ^e~^) and, since A = {x, \e~^ > u}, i.e. A = 
{a;,0 < a; < log(2?i)}, owe also deduce that X\u ~ l({0, (log(2u))2). The 
corresponding R code is 

T=5000 

f=function(x){ 

l/2*exp(-sqrt(x))} 
X=c(runif (D) ;U=c(runif (1)) 
for (t in 1:T){ 

U=c(U,runif (l,0,f (X[t] ))) 

X=c (X , runif (1 , , (log (2*U [t+1] ) ) "2) ) 

} 

par (mf row=c (1,2)) 

hist (X , prob=TRUE , col= "wheat2 " , xlab=" " , main=" " ) 
acf (X) 

b. If we define Y = \fX, then 

P(y <y)= P{X < y^) 

Jo 2 

When we differentiate against y, we get the density 

fyiy) =yexp{-y) 

which implies that Y ~ Ga{2, 1). Simulating X then follows from X = Y^. 
This method is obviously faster and more accurate since the sample points 
are then independent. 

Exercise 7.13 

a. The linear combinations X + Y and X — Y also are normal with null 

expectation and with variances 2(1 + p) and 2(1 — p), respectively. The 
vector {X + Y,X — Y) itself is equally normal. Moreover, 

cov(X + Y,X-Y)= E{{X + Y){X - Y)) = E{X'^ -Y"^) = 1-1 = 

implies that X + Y and X — Y are independent. 
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b. If, instead, 

then al ^ implies that (X + Y) and {X — Y) are dependent since 
E((X + Y){X - Y)) =<tI- al- In this case, X\Y = y ~ M{p^y, - 
p^)). We can simulate {X,Y) by the following Gibbs algorithm 

T=5000 ; r=0 . 8 ; sx=50 ; sy=100 
X=rnorm(l) ;Y=rnorm(l) 
for (t in 1:T){ 

Yn=rnonii(l ,r*sqrt (sy/sx) *X [t] , sqrt (sy* (l-r~2) ) ) 

Xn=rnorm ( 1 , r*sqrt (sx/sy) *Yn, sqrt (sx* ( l-r~2) ) ) 

X=c(X,Xn) 

Y=c(Y,Yn) 

} 

par (mf row=c (3 , 2) , oma=c (0,0,5,0)) 
hist(X,prob=TRUE,main=" " ,col="wheat2") 
hist (Y,prob=TRUE,main=" " ,col="wheat2") 
acf (X) ; acf ( Y) ; plot (X , Y) ; plot (X+Y , X-Y) 

c. If (Jx 7^ cTy, let us find a G K such that X + aY and Y are independent. We 
have E[(X + ay)(y)] = if and only if pax(Ty + aay = 0, i.e. a = —pax/cry. 
Therefore, X — pax/cTyY and Y are independent. 

Exercise 7.15 

a. The likelihood function naturally involves the tail of the Poisson distribu- 
tion for those observations larger than 4. The full conditional distributions 
of the observations larger than 4 are obviously truncated Poisson distribu- 
tions and the full conditional distribution of the parameter is the Gamma 
distribution associated with a standard Poisson sample. Hence the Gibbs 
sampler. 

b. The R code we used to produce Figure 7.13 is 

nsim=10~3 

lain=RB=rep (313/360, nsim) 

z=rep(0,13) 

for (j in 2: nsim) { 

top=round(lain[j -l]+6*sqrt(lEun[j -1])) 

prob=dpois(c(4:top) ,lain[j -1]) 

cprob=cumsum (prob/ sum (prob) ) 

for(i in 1:13) z[i] = 4+suin(cprob<runif (1) ) 

RB [j] = (313+siim(z) ) /360 

lain[j]=rgaimna(l,360*RB[j] ,scale=l/360) ; 

} 

par(mfrow=c(l ,3) ,mar=c(4,4,2, 1)) 
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hist (lam, col=" grey" ,breaks=25 ,xlab=" " , 

main= "Empirical average") 
plot (cuinsmn(lain)/l :nsiin,yliin=c(l , 1 . 05) ,type="l" , 

lwd=1.5,ylab=" ") 
lines (cumsum(RB) /I :nsim, col=" sienna" , lwd=l . 5) 
hist (RB, col=" sienna" ,breaks=62,xlab=" " , 

inain= " Rao-Blackwell " , xliin=c (1 , 1 . 05 ) ) 

c. When checking the execution time of both programs with system. time, 
the first one is almost ten times faster. And completely correct. A natural 
way to pick prob is 

> qpoisC 9999, lam [j-1]) 
[1] 6 

Exercise 7.17 

a. The R program that produced Figure 7.14 is 

nsim=10"3 
X=Y=rep(0,nsim) 

X[l]=rexp(l) #initialize the chain 

Y[l]=rexp(l) #initialize the chain 

for(i in 2:nsiin){ 



st=0 . l*nsim 

par (mf row=c(l ,2) ,mar=c(4,4,2,l)) 

hist (X , col= " grey " , breaks=25 , xlab= " " , main= " " ) 

plot(cumsiim(X) [(st+1) :nsim]/(l: (nsim-st)) ,type="l" ,ylab="") 

b. Using the Hammcrslcy Clifford Theorem per se means using f{y\x) / f{x\y) = 
x/y which is not integrable. If we omit this major problem, we have 



(except that the proportionality term is infinity!), 
c. If wc constrain both conditionals to (0, B), the Hammersley-Clifford The- 
orem gives 



X [i] =rexp (1 , rate=Y [i-1] ) 
Y [i] =rexp ( 1 , rate=X [i] ) 
> 



X ex.p{—xy} 
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f{x,y) 



exp{— a;y}/(l — e 



j y{l - e-B) 



exp{-a;2/} 



J y 



oc 



exp{-a;y} , 



since the conditional exponential distributions are truncated. This joint 
distribution is then well-defined on (0, -B)^. A Gibbs sampler simulating 
from this joint distribution is for instance 

B=10 

X=Y=rep(0 ,nsim) 

X[l]=rexp(l) #initialize the chain 

Y[l]=rexp(l) #initialize the chain 

for(i in 2:nsim){ #inversion method 

X [i] =-log ( 1-runif ( 1) * (1-exp (-B*Y [i-1] ) ) ) /Y [i-1] 
Y [i] =-log (1-runif ( 1 ) * ( 1-exp (-B*X [i] ) ) ) /X [i] 
> 

st=0. l*nsim 

inarge=function(x){ ( 1-exp (-B*x) )/x} 
rmiarge=f unction(x) { 

meirge (x) /integrate (marge , low=0 , up=B) $val} 
par(mfrow=c(l,2) ,mar=c(4,4,2,l)) 

hist(X,col="wheat2",breaks=25,xlab="",main=" ",prob=TRUE) 
curve (nmarge , add=T , lwd=2 , col=" sienna" ) 
plot(cumsum(X) [(st+1) :nsim]/c(l: (nsim-st)) ,type="l" , 
lwd=1.5,ylab=" ") 

where the simulation of the truncated exponential is done by inverting the 
cdf (and where the true marginal is represented against the histogram). 

Exercise 7.19 

Let us define 



9{x) 



- = y, 

X 



then we have 
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fviy) = fx ig-\y)) | ^g-'iv) \ 



which is the IG{a, b) density. 
Exercise 7.21 

Warning: The function rtnorm requires a predefined sigma that 

should be part of the arguments, as in 

rtnorin=f unction (n=l ,inu=0 , lo=-Inf ,up=Inf , sigma=l) . 

Since the rtnorm function is exact (within the precision of the qnorm and 
pnorm functions, the implementation in R is straightforward: 

hl=rtnonn(10"4,lo=-l,up=l) 
h2=rtnorm ( 1 ~4 , up= 1 ) 
h3=rtnoriii ( 10~4 , lo=3) 
par (mf row=c (1,3), inar=c (4,4,2,1)) 

hist(hl,freq=FALSE,xlab="x",xlim=c(-l,l) ,col="wheat2") 
dnorint=f unction (x) { dnorm (x) / (pnorm ( 1 ) -pnorm (- 1) ) } 
curve (dnormt , add=T , col=" sienna" ) 

hist (h2 , f req=FALSE , xlab="x" , xlim=c (-4,1), col="wheat2" ) 
dnormt=f unction (x) ■[ dnorm (x) /pnorm ( 1 ) } 
curve (dnormt , add=T , col=" sienna" ) 

hist (h3 , f req=FALSE , xlab="x" , xlim=c (3,5), col="wheat2" ) 
dnormt=f unction (x)-[ dnorm (x) /pnorm (-3)} 
curve (dnormt , add=T , col=" sienna" ) 

Exercise 7.23 



when Qfa is an integer, it is clearly possible to express t:{9i,92\x) as a 
sum of terms that arc products of a polynomial function of 0i and of 
a polynomial function of 62- It is therefore straightforward to integrate 
those terms in either 61 or 02. 



a. Since (j = 1, 2) 
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b. For the same reason as above, rewriting tt{9i,02\x) as a density in (^i, ^) 
leads to a product of polynomials in di , all of which can be expanded and 
integrated in 0i , producing in the end a sum of functions of the form 

namely a mixture of F densities. 

c. The Gibbs sampler based on (7.9) is available in the mcsm package. 

Exercise 7.25 

Warning: There is a typo in Example 7.3, sigma should be defined 
as sigma2 and signia2-[l]- should be sigina2 [1] ... 

a. In Example 7.2, since 9\x ~ Be{x + a,n — x-\-b), we have clearly E[^|a;] = 
{x + a)/{n + a + b) (with a missing parenthesis). The comparison between 
the empirical average and of the Rao-Blackwellization version is of the 
form 

plot(cumsum(T)/(l:Nsiiii) ,type="l" , col="grey50" , 

xlab=" iterations" ,ylab="" ,main="Example 7.2") 
lines (cumsumC (X+a) )/((!: Nsim) * (n+a+b) ) , col=" sienna" ) 

All comparisons are gathered in Figure 7.1. 

b. In Example 7.3, equation (7.4) defines two standard distributions as full 
conditionals. Since 7r(^|x, a^) is a normal distribution with mean and vari- 
ance provided two lines below, we obviously have 



E[6'|x,a^ 
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The modification in the R program follows 

plot (cuinsum(theta)/(l :Nsim) ,type="l" , col="grey50" , 
xlab=" iterations" ,ylab="" ,main="Example 7.3") 

ylab=" " ,main=" Example 7.3") 

lines (cumsum(B*thetaO+(l-B) *xbar) /(I : Nsim) ) , col=" sienna") 

c. The full conditionals of Example 7.5 given in Equation (7.7) are more 
numerous but similarly standard, therefore 

— rr^ 71 ■t'^ — 

follows from this decomposition, with the R lines added to the mcsm 
randomeff function 

plot (ciimsum(thetal) / (1 :nsim) ,type="l" , col="grey50" , 

xlab=" iterations" ,ylab="" ,main="Example 7.5") 
lines (cumsum ( (mu*sigma2+nl*tau2*xlbar) / (sigma2+nl*tau2) ) / 
(1 :nsim) ) , col=" sienna") 



7 Gibbs Samplers 67 



d. In Example 7.6, the complete-data model is a standard normal model with 

TTiX ~\~ [Tl — 771 1 Z 

variance one, hence E[^|a;,2:] = . The additional lines in 

n 

the R code are 

plot (cumsum(tliat)/(l :Nsini) ,type="l" , col="grey50" , 

xlab=" iterations" ,ylab="" ,main=" Example 7.6") 
lines (cumsiimC (m/n) *xbar+ ( l-m/n) *zbar) / ( 1 : Nsim) ) , 
col=" sienna") 

e. In Example 7.12, the full conditional on A, Xi\/3,ti,Xi ~ Q{xi + a,ti+ (3) 
and hence E[Xi\l3,ti,Xi\ = (xj + a)/{ti + /3). The corresponding addition 
in the R code is 

plot (cmnsum(lambda[, 1] )/(l :Nsim) ,type="l" , col="grey50" , 

xlab= " iterations " ,ylab="" ,main="Example 7. 12") 
lines (cumsumC (xdata [1] +alpha) / (Time [1] +beta) ) / (1 : Nsim) ) , 
col=" sienna") 



o 
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Fig. 7.1. Comparison of the convergences of the plain average with its Rao- 
Blackwellized counterpart for five different examples. The Rao-Blackwellized is plot- 
ted in sienna red and is always more stable than the original version. 
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Convergence Monitoring for MCMC 
Algorithms 



Exercise 8.1 

Warning: Strictly speaking, we need to assume that the Markov 
chain (a;*^*^) has a finite VEiriance for the h transform, since the 
assumption that E,f[h'^{X)] exists is not sufficient (see Meyn and 
Tweedie, 1993. 

This result was established by MacEachern and Berliner (1994). We have the 
proof detailed as Lemma 12.2 in Robert and Casella (2004) (with the same 
additional assumption on the convergence of the Markov chain missing!). 
Define 5^, . . . , S^~^ as the shifted versions of 6/- = S^.; that is, 

1 ^ 

'^fe y E '^(^^*'"'^)' i = 0,l,...,fc-l. 

The estimator can then be written as Ji = ^ YliZo hence 
var((5i) = var ^ 6ij 

<var(5^)/fc + ^ var(50)/fc2 

= var((5fc) , 

where the inequality follows from the Cauchy-Schwarz inequality 

|cov(5^,5i)|<var(50). 
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Exercise 8.3 

This is a direct appHcation of the Ergodic Theorem (see Section 6.2). If the 
chain (a;*-*') is ergodic, then the empirical average above converges (almost 
surely) to V.f[ip{X) lf{X)\ = 1/C. This assumes that the support oiip is small 
enough (see Exercise 4.13). For the variance of the estimator to be finite, a 
necessary condition is that 

E/[^(X)//(X)](x |^dx<oo. 

As in Exercise 8.1, we need to assume that the convergence of the Markov 
chain is regular enough to ensure a finite variance. 

Exercise 8.5 

The modified R program using bootstrap is 

ranoo=matr ix (0 , ncol=2 , nrow=25 ) 
for (j in 1:25){ 

batch=matrix(sample(beta, 100*Ts [j] ,rep=TRUE) ,ncol=100) 

sigmoo=2*sd(apply (batch, 2 , mean) ) 

ranoo [j , ] =mecin(beta [1 : Ts [j] ] )+c (-sigmoo ,+sigmoo) 

} 

polygon ( c (Ts ,rev(Ts) ) , c (ranoo [, 1] , rev (ranoo [,2] ) ) , col="grey") 
lines (cumsum(beta) / (1 :T) , col=" sienna" , lwd=2) 

and the output of the comparison is provided in Figure 8.1. 
Exercise 8.7 

Warning: Example 8.9 contains several typos, namely Yfe ^ J\f{9i,a'^) 
instead of ^ J\f{9i,a'^), the /i/s being also iid normal instead of the 9i's 
being also iid normal... 

Warning: Exercise 8.7 also contains a typo in that the posterior 
distribution on fi cannot be obtained in a closed form. It should 
read 

iShow that the posterior distribution on a in Example 8.9 can be obtained in 
a form.^||H[[^|^^|^^^|^^^|^m||||||||||||||[|^^^m||||||||[ 

Since 
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Fig. 8.1. Comparison of two evaluations of the variance of the MCMC estimate of 
the mean of /3 for the pump failure model of Example 8.6. 



-9 



(X a 



oc exp 



_1 r 18 



\i=l 



(which is also a direct consequence of the marginalization Yi ~ A/'(/i, a + cr^)), 
we have 
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a-3 r 2 

°^ 7 , 2\9 1 

[a + a^Y \ a 



1 + n(a + <T^)" 
2 



18 

/[^ - (a + a^)-! + n{a + a^)-^) 



1 18 / , 2^-2 / 18 

2{a + C72) ^ 2(1 + n(a + a^)-^) 

and thus 

, , , a-3(l + n(a + <72)-i)-i/2 ( 2 

7r(a y) oc — ■ — ^ exp <^ 

1 18 / , 2\-2 / 18 
l_y 2^ + 2 / 



Therefore the marginal posterior distribution on a has a closed (albeit com- 
plex) form. (It is also obvious from Tr{a, /Lt|y) above that the marginal posterior 
on ij does not have a closed form.) 

The baseball dataset can be found in the amcmc package in the baseball . c 
program and rewritten as 

baseball=c (0 . 395 , . 375 , . 355 , . 334 , . 313 , . 313 , . 291 , 

0.269,0.247,0.247,0.224,0.224,0.224,0.224,0.224,0.200, 

0.175,0.148) 

The standard Gibbs sampler is implemented by simulating 

^fa~^IJ + a~^yi _■, _2x-i\ 

\ 1 + na ^ I 

a\e, ~ ^11, 2 + Y,{9, - m) V2 j 

which means using an R loop like 
Nsim=10~4 

sigma2=0 . 00434 ; sigmain=l/ sigma2 
thet a=rnorm (18) 
mu=rep(rnorm(l) ,Nsim) 
alpha=rep(rexp(l) ,Nsiin) 
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for (t in 2:Nsim){ 

theta=rnorm(18,meaii=(mu [t-1] / alpha [t-1] +sigmain*baseball)/ 
(1/ alpha [t-1] +sigmain) , sd=l/sqrt ( 1/alpha [t-1] +sigmam) ) 
mu [t] =rnorm(l ,inecin=sum(theta) / (1/alpha [t-1] +n) , 

sd=l/sqrt (1+n/alpha [t-1] ) ) 
alpha [t] = (2+0 . 5*suin( (theta-mu [t] ) "2) ) /rgammad , 11) 

} 

The result of both coda diagnostics on a is 

> heidel.diagdncmc (alpha)) 

Stationarity staxt p-value 
test iteration 
varl passed 1 0.261 

Halfwidth Mean Halfwidth 

test 

varl passed 0.226 0.00163 

> geweke . diag (mcmc ( alpha) ) 

Fraction in 1st window =0.1 
Fraction in 2nd window = 0.5 

varl 
-0.7505 

If we reproduce the Kolmogorov-Smirnov analysis 

ks=NULL 
M=10 

for (t in seq(Nsim/10,Nsim,le=100)){ 
alphal=alpha[l : (t/2)] 
alpha2=alpha[(t/2)+(l: (t/2))] 
alphal=alphal [seq(l,t/2,by=M)] 
alpha2=alpha2 [seq ( 1 , t /2 , by=M) ] 
ks=c (ks , ks . test (alphal , alpha2) $p) 
} 

Plotting the vector ks by plot(ks,pch=19) shows no visible pattern that 
would indicate a lack of uniformity. 

Comparing the output with the true target in a follows from the definition 

marge=f unction(alpha) { 

(alpha" (-3) / (sqrt (1+18* (alpha+sigina2) " (-1) ) * (alpha+sigma2) '9) ) * 

exp(-(2/alpha) - (.5/(alpha+sigma2))*sum(baseball"2) + 

. 5* (alpha+sigma2) " (-2) *suin (baseball) "2/ (1+n* (alpha+sigma2) ~ (-1) ) ) 

> 
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Figure 8.2 shows the fit of the simulated histogram to the above function 
(when normalized by integrate). 




Exercise 8.9 

a. We simply need to check that this transition kernel K satisfies the detailed 
balance condition (6.3), f{x)K{y\x) — f{y)K{x\y) when / is the Be{a, 1) 
density: when x ^ 

f{x)K{x, y) = aa;"~i x{a + l) y" 
= a{a + l){xy)'' 
^f{y)K{y,x) 

so the Se(a, 1) distribution is indeed stationary. 

b. Simulating the Markov chain is straightforward: 

alpha= . 2 
Nsim=10"4 

x=rep(runif (1) ,Nsim) 
y=rbeta(Nsim, alpha+1 , 1) 
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for (t in 2:Msim){ 

if (runif (l)<x[t-l]) x[t]=y[t] 

else x[t]=x[t-l] 

> 

and it exhibits a nice fit to the beta Be{a, 1) target. However, running 
cumuplot shows a lack of concentration of the distribution, while the two 
standard stationarity diagnoses are 

> heidel . diagCmcmc (x) ) 



Halfwidth Mean Halfwidth 
test 

varl failed 0.225 0.0366 

> geweke.diag(incmc(x)) 

Fraction in 1st window =0.1 
Fraction in 2nd window = 0.5 

varl 
3.277 

are giving dissonant signals. The ef f ectiveSize (memo (x) ) } is then equal 
to 329. Moving to 10^ simulations does not modify the picture (but may 

cause your system to crash!) 
c. The corresponding Metropolis-Hastings version is 

alpha= . 2 
Nsim=10~4 

x=rep (runif (1) ,Nsim) 
y=rbeta(Nsiiii, alpha+1 , 1) 
for (t in 2:Nsiin){ 

if (runif (l)<x[t-l]/y[t]) x[t]=y[t] 

else x[t]=x[t-l] 

> 

It also provides a good fit and also fails the test: 

> heidel. diag(incmc(x)) 



Stationarity start 
test iteration 
varl passed 1001 



p-value 



0.169 



Stationarity start p-value 
test iteration 
varl passed 1001 0.0569 
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Halfwidth Mean Halfwidth 

test 

varl failed 0.204 0.0268 

> geweke.diagdncmcCx)) 

Fraction in 1st window =0.1 
Fraction in 2nd window = 0.5 

varl 
1.736 

Exercise 8.11 

a. A possible R definition of the posterior is 

postit=f unction (beta, sigma2) { 

prod(pnonn(r [d==l] *beta/sigina2) ) *prod(pnorin(-r [d==0] *beta/sigina2) ) * 
dnorm (beta,sd=5) *dgaiiima ( 1 / s igma2 ,2,1)} 

and a possible R program is 

r=Pima. tr$ped 

d=as . numeric (Pima . tr$type) -1 
mod=suBimary (glm(d~r-l ,f amily="binomial") ) 
beta=rep(mod$coef [1] ,Nsim) 
sigma2=rep(l/runif (1) ,Nsim) 
for (t in 2:Nsim){ 

prop=beta[t-l] +rnorm(l , sd=sqrt (sigma2 [t-1] *mod$cov. unsealed) ) 

if (runif (l)<postit(prop,sigma2[t-l] )/postit (beta [t-1] , 
sigma2 [t-1] ) ) beta[t]=prop 

else beta [t] =beta [t-1] 

prop=exp(log(sigma2 [t-1] )+rnorm(l) ) 

if (runif (1) <sigma2 [t-1] *postit (beta[t] ,prop)/(prop* 

postit(beta[t] , sigma2 [t-1] ) ) ) sigma2 [t] =prop 
else sigma2[t]=sigma2[t-l] 
} 

(Note the Jacobian in the acceptance probability.) 

b. Running 5 chains in parallel is easily programmed with an additional loop 
in the above. Running gelman.diag on those five chains then produces a 
convergence assessment: 

> gelmcai.diag(mcmc.list(mcmc(betal) ,mcmc(beta2) ,mcmc(beta3) , 
+ mcmc(beta4) ,mcmc(beta5))) 

Potential scale reduction factors: 

Point est. 97.5% quantile 
[1,] 1.02 1.03 



Note also the good mixing behavior of the chain: 
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> effectiveSize(mcmc.list(mcmc(betal) ,inciiic(beta2) , 
+ mcmcCbetaS) ,inciiic(beta4) ,mcmc(beta5))) 

varl 
954.0543 

c. The implementation of the traditional Gibbs sampler with completion 

is detailed in Marin and Robert (2007), along with the appropriate R 
program. The only modification that is needed for this problem is the 
introduction of the non-identifiable scale factor cr^. 

Exercise 8.13 

In the kscheck.R program available in mcsm, you can modify G by changing 
the variable M in 

subbeta=beta[seq(l ,T,by=M)] 
subold=oldbeta [seq( 1 , T , by=M) ] 
ks=NULL 

for (t in seq((T/(10*M)) , (T/M) ,le=100)) 

ks=c (ks ,ks . test (subbetaCl : t] , subold [1 : t] ) $p) 

(As noted by a reader, the syntax ks=c(ks,res) is very inefficient in system 
time, as you can check by yourself.) 



Exercise 8.15 

Since the Markov chain (^(*)) is converging to the posterior distribution (in 
distribution), the density at time t, iTt, is also converging (pointwise) to the 
posterior density ■it{9\x), therefore Ut is converging to 

/(a;|6'(°°))7r(6'(°°)) 

7-7 — rr— = mix) , 

for all values of 0^°°). (This is connected with Chib's (1995) method, discussed 
in Exercise 7.16.) 



Exercise 8.17 

If we get back to Example 8.1, the sequence beta can be checked in terms of 
effective sample via an R program like 

ess=rep(l,T/10) 

for (t in 1:(T/10)) ess [t] =ef f ectiveSize(beta[l : (10*t)] ) 

where the subsampling is justified by the computational time required by 
ef f ectiveSize. The same principle can be applied to any chain produced by 
an MCMC algorithm. 

Figure 8.3 compares the results of this evaluation over the first three exam- 
ples of this chapter. None of them is strongly conclusive about convergence... 
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Fig. 8.3. Evolution of the effective sample size across iterations for the first three 
examples of Chapter 8. 
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